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' ■ Abstract 

(— I . Structure functions are a measure of the partonic structure of hadrons, which is important 

for any process which involves cohiding hadrons. They are a key ingredient for deriving partons 
distributions in nucleons. In recent years dramatic progress has been made in the understanding 
I> . of the nucleon structure and the precision of its partonic content, due to vast theoretical progress, 

and the availability of new high precision measurements. This review gives an overview on present 
1/-^ ' structure function and related data, and on the most recent techniques used to extract parton 

■ distribution functions to describe the structure of the proton. Special attention is given to the 

. determination of the uncertainties on the parton distributions. 

o 



in 
in 



X 



1 Introduction 



In order to obtain very high energies more easily many particle colliders have hadrons, in particular 
' protons and antiprotons, in the initial state. Since the 30th of March 2010, the LHC has started a 
two year operation period at the highest collision energy ever produced in the laboratory, namely 7 
TeV in the centre of mass of the particle collisions. The LHC accelerates proton beams in opposite 
directions, each beam with an energy of 3.5 TeV. With these collisions the LHC has entered the so called 
Terascale energy regime, which is expected to lead to new insights in the dynamics particle physics, 
and will possibly revolutionise the field. Hadrons are however composite particles, consisting of quarks 
and gluons, and these partons are the fundamental constituents that are involved in the collisions. The 
full and detailed understanding of the structure of the protons will be required to extract the most of 
the physics from the LHC data. 

Hadrons are bound together by the strong force, described by Quantum Chromodynamics (QCD). 
The strong coupling constant ^^(/i^) runs with the energy scale /i^ of a process, decreasing as 
increases, a phenomenon known as asymptotic freedom. Hence, asifi"^) is very large if is at the scale 
of nonperturbative physics, about IGeV, but ^^(/i^) <C 1 if /i^ ^ IGeV^, and perturbation theory can 
be used. Because of the strong force it is difficult to perform analytic calculations of scattering processes 
involving hadronic particles from first principles. However, the weakening of asi^"^) at higher scales 



1 



leads to the Factorisation Theorem which separates processes into nonperturbative parton distribution 
functions (PDFs) which describe the composition of the proton and can be determined from experiment, 
and perturbative coefficient functions associated with higher scales which are calculated as a power-series 
in asifi'^)- Thus in order to understand any of the results of these experiments one needs to understand 
how the incoming hadron is made up from the constituent quarks and gluons, the interactions of which 
we then know how to calculate using perturbation theory as long as there is a large scale in the process 
so that perturbation theory is applicable. 

The production of any particle -say a Higgs boson - at a hadron collider can be determined by the 
cross section of the parton-parton collision to produce the Higgs, convoluted the probabilities to find 
these partons within the incoming hadrons. We can use deep inelastic scattering (DIS) experiments 
to probe the structure of hadrons and the fundamental interactions of quarks, gluons, and leptons. In 
DIS experiments a lepton probes a target nucleon or nucleus via exchange of an electroweak boson 
In fact, DIS was the first method to directly detect quarks in hadrons, in an experiment at SLAC in 
1969 |TJ. In DIS an elementary particle transfers large energy- momentum to a hadron, which then breaks 
up inelastically. Essentially it knocks a quark out of the target hadron, which then hadronises. The 
assumption is that for high energy momentum transfers, corrections to these basic processes from gluon 
exchange between quarks can be treated in QCD perturbation theory. The hadronisation process, where 
perturbation theory cannot be used, takes place over much longer timescales and larger distance scales 
than the initial point-like electroweak scattering. However, we also have to consider the nonperturbative 
initial state. 

The advent of the HERA collider in particular has led to significant progress in the last 10-15 years 
on the precise understanding of the structure of the proton, especially in the kinematic region of small 
momentum fractions x carried by the partons with respect to the proton momentum. DIS experiments 
extract information from the lepton scattering cross sections to measure Structure Functions of the 
target, which are directly related to the PDFs. 

Apart from DIS experimental data, results from jet production at hadron colliders, Drell-Yan, 
prompt photon and heavy vector boson experimental data can be used to constrain the partons in 
the hadrons. For over 20 years several groups have used all the available experimental data to make 
global fits to extract PDFs which can be used in studies that involve colliding hadrons. Nowadays these 
datasets contain thousands of data points from over a dozen different experiments, and fits to these data 
are performed with next to leading, and even next to next to leading order QCD tools. The original 
PDF extractions were relatively simple [21 El S] but increased precision of the measurements, especially 
of the available DIS data have lead to a steady progress to more sophisticated extractions of the PDFs, 
notably on the strange sea asymmetry, the treatment of heavy quarks, the constraints on the gluon 
-which is not directly probed in DIS as it is invisible to the electroweak boson- determining the PDFs 
to higher perturbative orders in the strong coupling constants, etc. Moreover, for the experimentalists 
it is often equally important to know what the uncertainties on the PDFs are. In recent years the two 
main groups providing general fits (CTEQ and MRST/MSTW) have developed a prescription for deter- 
mining uncertainties, and have now been joined by several other groups (NNPDF, ABKM, GJR). Also 
the HERA experimental collaborations have combined their data and produced PDFs with uncertainty 
bands (HERAPDF). 

In this paper we will review the present understanding of the structure or the proton, and the 
uncertainties on such determinations. In the remainder of this section we introduce the formalism, 
kinematics, the important factorisation hypothesis, the QCD corrections, and a short overview of the 
latest data relevant for the extraction of the nucleon structure. Section 2 discusses the determination of 
parton distribution functions. In Section 3 the present different approaches of the parton distribution 
determinations are presented, together with determination of the uncertainties. Section 4 discusses 
some theoretical sources of uncertainties in more detail. In Section 5 some specific PDFs suited for LO 
Monte Carlo generator programs are presented. Section 6 gives an outlook for future measurements 
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and developments and Section 7 summarises the main points of the review. 



1.1 Structure Functions - Kinematics 

We consider deep inelastic scattering of electrons (or neutrinos) as a typical example. Let us examine 
the former with scattering from a hadron H, of mass mp, we have in the first case, i.e. 

e{pi) + H{P)^e{p2)+X, (1) 

where X is an arbitrary final state. To lowest order in the electromagnetic charge e, the electron couples 
to the hadron through a virtual photon. This can be seen in Fig. [T]where we also show the corresponding 
diagram for neutrino interaction via a W boson. Concentrating first on the electromagnetic scattering 
process, the lowest order QED amplitude is 

tM = iteYuiP2)Yuip,)t^{X\Ji:\H,P), q=pi-p2. (2) 

In the hadron rest frame P = (mp,0), pi = {Ei,pi) and p2 = {E2,P2)- The basic relativistic invariant 
variables are 

u = P-q = mp{Ei-E2), = -q^ = 2prP2 = 2EiE2{l - cosO) , (3) 

where we have neglected the electron mass, so that Ei = \pi\, E2 = \p2\, and 9 is the electron scattering 
angle. Clearly > and also 

m| = (P + qf >ml < 2u . (4) 

The standard expression for the differential cross section gives 

V / 2 e spins 

where F is the flux factor, F = AEiITIh, in the hadron rest frame. From Eq. ([2]) 

E \M\' = j^2L'^^(H,P\J^\X){X\JC\H,P), (6) 

e spins / 

where, setting mg = 0, 

Luf, = 4(pi,i.p2,M + Pl,fiP2,u - Quf, PVP2) ■ (7) 

If we define 

^H^iq, P) = ^i: (27r) V(g + P - px) {H, P\J^\X) {X\Ji:\H, P) , (8) 

where we implicitly average over the hadron spin in this definition, then the cross section formula Eq. 
([5]) becomes 

d(T 1 

W2 = 8{2n/E,mnE2 W? " 

By virtue of conservation of the electromagnetic current {px — P)ti{X\J(^\H, P) = 0, we have 
qfiW^^i^q, P) = qvWH^{q, P) = 0. The most general Lorentz covariant form compatible with this is 

w^H^iq, P) = [-g''' + ^) w, + {p'^-^ g-) {p^^ - ^ g^) W2 , (10) 
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Figure 1: The kinematics for deep inelastic scattering. 



where Wi^2 are Lorentz scalar functions characteristic of the hadron H, which depend on the two 
variables and u. In writing Eq. f lTOj) we have neglected a possible term involving the e-tensor but 
this can be excluded by using parity invariance. To calculate the contraction in Eq. (Q we may use 
the fact that current conservation also leads to Ly^q^ = Li,^q^^ = so that from Eq. ([7]) and Eq. ( ITOi) 
we have, 



= AQ^Wi + 2ml{AE,E2-Q^)W2 



where in the second line we have used Pi-p2 = —^Q , if f^e = 0, together with pi-P = itihEi, P2-P = 
mHE2. In the limit of large momentum transfer ~ ^{^) oo, we can define dimensionless variables 

p . E2 



X 



y 



2v ' miiEi 
which stay fixed. It is easy to see that by definition 
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^1 



< a; < 1 



Then from Eq. fllip 



L,^1^7(g, P) = ^E^rriH [xyW^\ ^ VW2 

y 



< y < 1 



1 + 



Since 
we have 



d^P2 -> 27r ^^^(cos 6) d^a = tt^^s dQ^ dy = 2tiE2V dx dy , 
dcr A-Ko^ 



dxdy Q* 
where a = e'^/An and 



2mHEMl - y)F2{x, Q') + xy'Fi{x, 



1 + 



nip 



F2{x,Q^) = uW2, F,{x,Q' 



Wi 



(12) 
(13) 

(14) 

(15) 
(16) 

(17) 



are dimensionless, frame invariant "structure functions" . Clearly comparison of cross section measure- 
ments with Eq. ( fT6|) allows Fi^2 to be disentangled. 

In the basic process Eq. ([1]) the electron e may be replaced by a muon without changing any 
of the subsequent results. A very similar analysis also holds for inelastic scattering of neutrinos, or 
anti-neutrinos, where 



iy^{p,) + H{P)^l2-{p2)+X or v,{p,) + H{P) fi+{p2) + X . 



A 



The scattering is now mediated by a virtual or W , instead of a virtual 7, so that to first order in 
the weak interaction the amplitude is similar to Eq. ([2]) but 




8 - g2 ^2 + Q2 • 



1 _Gf 



(19) 



If we assume ^ m 



^, then instead of Eq. (|T6|) we have 




2mHE{{l - y)F^{x, Q^) + xy^F^ix, Q^) ± - \y)Fi{x, Q^)) , (20) 



where now we have a parity violating structure function F^(x,Q'^). More generally Eq. ( 120|) should 
contain a factor (1 + Q"^ / m^)~'^ . 

1.2 Structure Functions - Factorisation 

Wj!i'^{q, P) could be evaluated exactly if one knew the wavefunctions of \H) and |X) in terms of quark 
and gluon Fock states. In practice this is a difficult non-perturbative problem. We can apply the 
assumption that since the creation of hadrons takes place on a time (and distance) scale 0{1/ Aqcd), 
while the creation of the final state in terms of quarks and gluons in the hard scattering happens over 
the short time (and distance) scale 0{1/Q), we are able to sum over final state quarks and gluons rather 
than hadrons up to corrections of O^Aqi^^/Q^). However, we also have the added complication that the 
target hadron momentum P satisfies P^ = mp which is fixed and the hadron wave function depends on 
low energy scales. It is necessary to introduce a further factorisation assumption, which can be derived 
to all orders in the perturbation expansion, in order to justify using the ideas of asymptotic freedom. 
We use similar physical reasoning to the hadronisation in the final state and assume that the large 
momentum transfer from the virtual photon takes place to a single quark which has fiuctuated out of 
the proton over the short time (and distance) scale 0{1/Q), and we can neglect the QCD interactions 
between hadron constituents due to asymptotic freedom. On the larger time (and distance) scale 
0{1/Aqcd) the struck quark and the remaining quarks and gluons interact strongly via QCD forces in 
order to hadronise in the final state, and this processes is largely independent of the former so-called 
hard process. One can prove in DIS scattering that this factorisation indeed holds up to corrections 
0{Aq(jp,/Q'^). Within this framework the leading term in the deep inelastic limit is then given in Eq. 
([8]) by letting |X) — )■ \qf,k)\X'), as illustrated in Fig. ([2]), where \qf,k) denotes an on-shell 'parton', 
either a single quark or anti-quark state with fiavour index / and 4-momentum k, and \X') denotes the 
remnant of the scattered proton. 

This allows us to rewrite Eq. ([8]) as 



The momentum of the on-shell struck quark (or antiquark) is A; = A; + g and the proton momentum 
satisfies the momentum conservation constraint P = k + px' ■ The second part of Eq. ( ETj) simplifies to 



/ X' \'^'^) q spir 



Q]{H,PmqfYqf\qfMX) {X\{qf:k\qp^qfmH,P) 



(21) 



E Q) {H,PmqfYqf\qfMX){X\{qf,~k\qfl%mH,P) 



q spins 



= Q}{H, P\qf\X)Y~k ■ ^Y{X\qf\H, P). 



(22) 
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Figure 2: Deep inelastic scattering viewed in terms of scattering from a single parton. 



Summing over both quarks and anti-quarks we can represent Eq. ([8]) as 

P) ^ E / d'A;tr(iy7(g, kW^AP, k) + WJiq, k)THAP, k)) , (23) 
/ 

where Wj'^{q, k), Wj'^{q, k), denotes the relevant contributions when the virtual photon with momentum 
q couples to a quark, anti-quark, with flavour / and momentum k, 

iQf Yrik + q) Y m + qf)e{{k + qf) , (24) 
k-vx') {H,P\qfjX') {X'\qfp\H,P) , 

k - px') {H, P\qf^\X') {X%JH, P) . (25) 

The average over the hadron spins is implicit in the above. The expression Eq. (123|) obtained assumes 
that the quark, or anti-quark, does not interact with the state X' after it couples to the virtual photon. 
It may be proven in the deep inelastic limit, that this is true up to contributions suppressed by terms 

OiAlcn/Q')- 

It may be shown that if we define the parton density functions 

x^ti(yTHjiP,k)'^ = qf{x), 

x)tr(7rH,/(P,fc)) = g/x), (26) 

where strictly speaking it is the light-cone momenta k^ = k^ + k^ and P+ = P^ + P^ which are used 
in the delta function, and we are in a frame where implicitly 

P±=q± = 0, (27) 

then we find 

Fi{x,Q') ^ +^/(^)) ■ (28) 

/ 

The result in Eq. f l28p demonstrates that Fi depends only on the dimensionless variable x = Q^/2z/ 
in the deep inelastic limit, which is known as Bjorken scaling [5l [6]. The experimental observation of this 
scaling was the first direct evidence for point-like constituents in hadrons[7]. The quark distribution 
functions qf{x), qj{x) defined by Eq. (!26|) for x > are an intrinsic non-perturbative property of the 



W^f^{q,k) = Wf{q,k) = - 
and where we define 

THAP.k)p^ = Y.5\p- 

X' 

THj{P,k)p^ = Y.S\P- 



1 

2P 
1 

2P 



d^k6 



d^ksr;-- 



k 

P 

k 

P 
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hadron H. They may be interpreted as momentum distributions for quarks and anti-quarks inside the 
hadron and in principle (thought not yet in practice) they can be computed from a non-perturbative 
analysis in QCD. At present these distribution functions must simply be determined experimentally 
from (largely) DIS experiments. We also find that 

F2{x,Q^) = 2xF,{x,Q') = xJ2Qf[qf{x) + qfix)) . (29) 

/ 

The form of the relation between Fi and F2 is a consequence of the spin 1/2 nature of the struck quark. 
The difference is proportional to the longitudinal structure function Fl{x,Q^), and is zero at lowest 
order due to helicity conservation [8]. 

Applying these results to deep inelastic scattering on a proton target the proton wavefunction is 
dominated by uud + ■ ■ ■ where the dots indicate uud plus further quarks (including heavy flavours). 
With notation qu{x) = u{x), Quix) = u{x) etc, 

-F2,proton(a;, Q^) ~ x(|(m(x) + u{x)) + |((i(x) + d{x)) + heavy flavours^ . (30) 

We note that the derivation of Eq. f l2S]) is an approximation which relies on the assumption that k, being 
the the momentum of a quark (or antiquark) inside the proton, should have a very small probability 
of having any momentum components greater than 0{Aqcd)- As such it also implies corrections of 
0{Aq^jj/Q^) corresponding to higher twist operators (as discussed in[9]). However, it also ignores 
rather more important higher-order QCD corrections, which we consider next. 

1.3 QCD Corrections 

In the previous section we have assumed that the quark interacts with the virtual 7 for large with 
a point-like coupling, not including any corrections due to QCD. In a field theory approach the quark 
fields in the currents are treated as if they were effectively free, disregarding QCD effects. This is 
ultimately justified by asymptotic freedom but there are calculable perturbative QCD corrections to 
Bjorken scaling. To simplify the discussion, we examine a generic structure function F(x,(5^), such as 
might be measured in deep inelastic scattering. The dominant contributions for — 00 arise from 
the elementary particles of perturbative QCD, quarks and gluons, but QCD corrections are no longer 
ignored and F{x, Q"^) cannot any more be represented in terms of solely point-like couplings to the 
quarks. Hence, we recognise that we can now create a number of quarks, antiquarks and gluons in 
the final state via a hard QCD perturbative process. The point-like vertex is now also replaced by 
a "coefficient function" Ci{q,k) representing this hard scattering process, where i = qf,qjr,G for an 
incoming quark, antiquark or gluon with 4-momentum k coupling to a current J carrying 4-momentum 
q, q^ = —Q^, and which includes all (perturbative) QCD corrections. Some of the leading as corrections 
to the lowest order diagram are illustrated in Fig. [3] 

In the relevant limit = — — 00, used above, x = Q'^/2u {u = P-q) fixed, F{x,Q'^) is assumed 
to have the form of a sum over contributions for different i = qf,qpG. Taking into account these 
considerations the expression for the structure function reduces to a single variable integral 




where 

fi{y, /^^) = {ifiy, fj'^),Qf{y, /^^), ^(y, /^^)) , ^ = g , (32) 

and we now integrate over the possible values of the momentum fraction y. 

The definition of the parton distributions is the same as in the previous argument except for three 
points. 
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Figure 3: Factorisation with QCD corrections in deep inelastic scattering. 



1. Now we also have a nonperturbative contribution corresponding to the possibility of scattering 
off a gluon in the hadron. 

2. The momentum fraction of the parton leaving the hadron is denoted by where y > x since some 
of the original momentum may be lost by branching to other particles before the scattering with 
the photon which defines the variable x. 

3. The infrared singularities in the coefficient functions which have been regularised by /^j? must be 
absorbed into the nonperturbative definition of r(P, k) rendering it /i^ dependent when we include 
QCD corrections. This is natural because the singularities come from the infrared limit of the 
integral over k where the coupling is strong and really we should be using nonperturbative physics. 
The divergences are determined entirely in terms of the incoming parton, and are independent of 
the particular scattering process as long as it is one which sums over final states, though we can 
be slightly less inclusive and define e.g. final state jets. 

It is important to recognise that Q"^) as a potentially measurable physical quantity must be 
independent of ^p- In general for vectors Ai,Bi 

fiF-^(AiBi) = ^ fiF:^Ai = -AjPji , ^iF^Bi = PijBj , (33) 

for some Piy The integral convolution in Eq. fl3ip can be regarded similarly as a form of matrix 
multiplication for two /iF-dependent factors. The analogous version of the equations for A^B in Eq. 
( 133|1 become integral relations 
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where the Pij{y\as) are determined by the form of the infrared divergences regularised by /x/ and 
absorbed into the nonperturbative definition of the partons. As such they are independent of Q^, the 
particular current J and the hadron H, and may be determined as an expansion in as from Eq. fl5^ . 
In general all components of Pijiy, as) are non zero. 

The equations fl34f35p . are referred to as the DGLAP equations[10l [TTl [T2l [T3], and the perturba- 
tively calculable Pij{y;as) are known as splitting functions. They were effectively derived as anoma- 
lous dimensions of operators within the context of the renormalisation group and operator product 
expansion |14l IT5] . The coefficient functions and splitting functions were obtained at next to lead- 
ing order (NLO) {0{as) for the coefficient functions and for splitting functions) within a few 
years [T6l [T71 [T8| IT9| |20| |2T| |22| 123] . In these above equations we should take as — )■ asifJ^ji) ^^e running 
coupling. It is important to note that asifJ'ji) ^ function of the renormalisation scale /i/j not the fac- 
torisation scale fip since its running is determined by the renormalisation of the ultraviolet divergences 
in the theory, and is nothing to do with the infrared regularisation which introduces fip- 

Since /ir and fip are arbitrary we may choose their values independently. However, it is natural, 
and very common to set = = so that Eq. ( 13T|) becomes 

F(x,Q2)~ E f'-CJ-,l;as{Q'))My,Q')^ E C^ic^siQ')) ® W) , (36) 
where from (!35|) 




The results Eq. (!36|) and Eq. (1371) then provide the justification for the claim that asymptotic freedom 
allows the evolution of F{x, Q^) to be calculated perturbatively in the deep inelastic limit. Hence, 
once we have measured the parton distributions at some low scale Qg we can calculate their evolution 
to higher scales perturbatively. Comparison of theory and data on structure functions and their scaling 
violations works extremely well, and is one of the best tests of QCD. 

We can apply the same sort of reasoning as above to hadron-hadron collisions. The coefficient 
functions Cj(x, ^^(/i^)) describing a particular hard scattering process involving incoming partons are 
process dependent but are calculable as a power-series in the strong coupling constant ^^(/i^). 

c''{x,as{^^')) = Y.c''^'Mi^^^)■ 

k 

The scale of the coupling will be set by the hard scale in the particular process, e.g. if one produces a 
particle with large mass m in the final state then = m^. If there is no hard scale in the perturbative 
scattering process, e.g. if we simply have proton-proton scattering to hadrons with no identified hard 
final state, perturbation theory cannot be reliably used. Since the parton distributions fi{x,q^) are 
process-independent, i.e. universal, once they have been measured at one experiment, one can predict 
many other scattering processes. Consider for example the diagram for proton-proton scattering to 
form hadrons plus a Higgs boson, a contribution to which is shown in Fig. |H 

The definition of the parton distributions is exactly the same for this diagram as it is in Deep 
Inelastic Scattering. Hence, once we calculate (x,, Xj, as{jn\j)) we can calculate the cross section for 
Higgs production at a proton-proton collider, i.e. the Tevatron and/or Large Hadron Collider (LHC). 
This is given simply by 

aH{xuX2,mH) = 2^ / / — ^ T'^^s) fiiVuf^ >fj{y2,^J' > , (38) 
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Figure 4: Factorisation in a hadron-hadron collision. 



This general procedure can be applied to any process, so although parton distributions are essentially 
nonperturbative their determination in a small number of experiments then leads to huge predictive 
power. A comprehensive discussion of factorisation and its proof can be found inp3]. We finally note 
that the definition of the parton distributions can be altered so long as the coefficient functions are 
changed in a compensating manner such that all physical quantities are invariant. This is called a 
change of factorisation scheme. In the majority of cases we work with PDFs defined in the MS scheme, 
since calculations are done in manner where both ultraviolet and infrared divergences are removed using 
dimensional regularisation and the MS procedure (the manner of dealing with ultraviolet divergences 
defining the renormalisation scheme and infrared divergences defining the factorisation scheme). An 
alternative factorisation scheme sometimes used is the DIS scheme |25], which is simply defined so that 
the LO relationship between structure functions and quarks in Eq. fjSUj) is true to all orders. 

1.4 Recent Progress on Experimental Data 

As will be shown throughout this report, many different experiments have made measurements which 
can be used to constrain the structure of the proton. DIS experiments constitute by far the most 
important data input, and in the last two decades it has been especially the HERA electron proton 
collider that has led to spectacular progress in the understanding of the proton structure. 

HERA started to collect ep collisions in 1992, at first at a centre of mass energy of 300 GeV, and later 
at 320 GeV. HERA collided 27.6 GeV electrons/positrons on 820 (920) GeV protons, which allowed a 
measurement of structure functions at x values down to ~ 10~^ and to values of up to ~ 50, 000 
GeV^. HERA did not only open a new kinematic domain for DIS, but the collider experimental 
environment also allowed the use of different type of detectors, as compared to the classical DIS fixed 
target experiments. In particular the hadronic final state is fully measurable for the majority of the 
ep collisions in the collider experiments HI and ZEUS, allowing for either an excellent control of the 
systematics of the measurements, or for using hybrid methods based on scattered electron and hadronic 
final state reconstruction of the kinematical variables in each collision. Even the radiative corrections 
can be checked in part due the detection of the emitted photons in the direction of the electron beam. 

As a result, in the low and medium range, where the statistics is abundant, the structure function 
measurements have an accuracy of 1-2%, and thus allow for a very precise determination of the quark 
content of the proton. The most recent structure function results, based on the combined HI and ZEUS 
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HI and ZEUS 




Figure 5: On the left the HERA combined NC e'^p reduced cross section and fixed-target data as a 
function of Q^. The error bars indicate the total experimental uncertainty. The HERAPDFl.O fit is 
superimposed. On the right the bands represent the total uncertainty of the fit. Dashed lines are 
shown for values not included in the QCD analysis. The parton distribution functions obtained 
using ACOT heavy-flavour scheme compared to HERAPDFl.O for xUy,xdy,xS = 2x{U + D),xg, at 
= 10 GeV^. The bands show total uncertainties of the HERAPDFl.O fit. 

data, and compared to lower energy fixed target experimental data, are shown in the left of Fig. |5l These 
data have been used in QCD fits, as will be described below, and the resulting parton distributions are 
shown in the right of FigJS] The blue bands show uncertainties on these parton distributions. Much of 
the discussion in this report will focus on how such error bands can be determined. The importance the 
HERA data have played for e.g. particle production at the LHC is shown in Fig. El it shows the effect 
of the HERA data in the determination of the quark distribution and the precision of the Z production 
cross section at the LHC for collisions at a centre of mass energy of 14 TeV. The uncertainties are 
greatly reduced when HERA data is included, compared to the result in a world without HERA data. 

Apart from the fully inclusive measurements, the HERA measurements also allows for measurements 
of final states including jets -which will help to constrain as in the fits- and more importantly, the 
presence of silicon vertex detectors in the HI and ZEUS experiments allow for the tagging of collisions 
with heavy fiavours. The cross sections of events with either charm or bottom quarks in the final state 
can be used to determine the F2{x, Q"^) and F^ix, Q"^) heavy fiavour structure functions. The present 
data analysed from run I at HERA (1992-2000) allows for measurements with a precision of the level 
of 15% for charm and 30% for bottom tagged structure functions. Just before its closure in 2007 the 
HERA machine was operated at reduced centre of mass energies, namely 575 and 460 GeV. Measuring 
the cross section for given x, values at different centre of mass energies is equivalent to measuring at 
a different value of y. Thus the combination of measurements at different energies allows to disentangle 
F2{x,Q'^) and Fi{x,Q'^){Fl{x,Q'^)) in eq. (17). These measurements provide extra constraints in the 
QCD fits and the Fl{x,Q'^) data in particular are directly sensitive to the gluon density distributions. 

Besides the structure function data, important new data for constraining PDFs come from Tevatron 
measurements such as the di-jets, Z production and the W asymmetries at the collider experiments, 
as well as Drell-Yan measurements. The usage and impact of these data sets will be discussed in the 
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Figure 6: The up quark extracted from a global fit together with uncertainty with and 
without HERA data (left) and the rapidity distribution for production of Z at the LHC at 
14 TeV, and the uncertainty for the same PDF fits with and without HERA data (right). 



sections in the following. In future similar data from the LHC itself will be useful input for further 
constraining the proton structure. The experiments are gearing up to make such measurements. 

2 Fits to Determine Parton Distributions 

2.1 General Procedure 

From the previous section we see that in order to make predictions for hard scattering processes at 
any collider which uses hadrons in the initial state we must first obtain an extraction of the parton 
distributions of the hadron by fitting to the existing data which constrains these parton distributions. 
The process for doing this is generally called a "global fit" , although there are a wide variety of definitions 
of what "global" actually means. However, in all cases the basic procedure is very similar. Global 
fits[26l ETJ EHl Ell [30l [31] to determine parton distributions use available data, in all cases largely 
ep — >■ eX (Structure Functions), and the most up-to-date QCD calculations to best determine the 
parton distributions and their consequences. Currently the default is to use NL0-in-Q;s'((5^), i.e. for 
the coefficient functions and splitting functions this means 



where P is process dependent, e.g. P = for deep inelastic scattering where the lowest order process is 
an electroweak boson scattering from a quark, but P = 2 for jet production in hadron-hadron collisions 
where an example of a leading-order process is parton-parton annihilation to form a new parton-parton 
pair. NNLO coefficient functions are known for some processes, e.g. structure functions[32l |33l [SU [351 
[361 [37], and NNLO splitting functions have been completed [381 [39]. Full NNLO fitst26l W\, [31] are now 



C({x,asm) 
P^J{x,as{Q^)) 



(39) 
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just possible using with some (arguably) very good, and continually improving approximations, though 
the precise form of the approximation depends on the group performing the fit. 

Perturbation theory is usually thought to be valid if as{Q'^) ^ 0.4 — 0.5. Since the running coupling 
constant 0:5 (Q^) is roughly equal to 

"^^^'^ " (ll-2/3iV,)ln(QVA^^D)' ^^^^ 

where Aqcd is the scale of hadronic physics, i.e ~ ISOMeV at LO in QCD, one can use perturbation 
theory if > 2GeV^. This cut should also remove the influence of higher twists. Hence, most global 
fits start evolution at Ql in a range from about 1 — 5GeV^ (an exception is|31j which start somewhat 
lower) and fit data with a minimum of about 2 — 5GeV^. Additional cuts and/or higher twist 
corrections are also generally applied, as discussed later. 

In principle there are 13 different parton distributions to consider 

u,u, d,d, s,s, c,c, b,b, t,i g (41) 

However, rric, mi,, nit ^ Aqcd so these heavy flavour parton distributions are determined perturbatively. 
However, even at the LHC we are at energies not very far above the threshold for top production and 
it is normally most useful to consider them as only final state particles. Hence, most PDF sets do not 
include the top quark and antiquark as a parton. This is consistent with the majority of cross section 
calculations which use a renormalisation scheme where top is indeed assumed to only be created in the 
final state. There are models of nonperturbative "intrinsic" charm and bottom quark contributions [?0], 
and some fits investigate the importance of these, but they are currently not part of the default fit for 
any group. 

Until recently it has been standard to assume that s = s. This leaves 6 independent combinations 
of partons. However, with the most recent data there is some constraint on s — s (as discussed later) 
and currently this is allowed to be nonzero at input in some sets|26[ [28] , though at NNLO a tiny 
asymmetry is generated by evolution even if it is zero at input [5T]. Until the most recent sets it was 
also common to use s{Ql) = k,1/2{u{QI) + d{Ql)), where in practice k ~ 0.4, but this is also becoming 
more sophisticated with the most recent fits, and some shape as well as normalisation difference is 
usually allowed when comparing the strange and light quark sea. 

For the up and down quarks and antiquarks and the gluon there are then five degrees of freedom 
(which may overlap with the strange quark parameterisation as explained above). These can be repre- 
sented in a variety of fashions, but some are more obviously useful than others. For example MSTW 
use 

uv = u — u, dv = d — d, d — u sea = 2 * (m + J + s), g, (42) 

where the first three combinations are all nonsinglet combinations. Even though it is a combination of 
the distributions already mentioned it is also often useful to define the singlet quark distribution 

S = My + + sea +(c + c) + (6 + 6). (43) 

For each group the input partons are parametrised in some fashion (though for [28] this involves a 
very large effective number of parameters). For example in[26] a number of the input distributions have 
the general form 

x/(x, Ql) = (1 - xfil + ex°-^ + -ix)x^. (44) 

There is much variation, but all groups (including NNPDF) include the general feature of a power of 
(1 — x) as a; — )■ 1 and a power (or possibly two powers) of a: as x — )■ 0. For non-singlet combinations, 
e.g. the valence quarks and d — u, 5 is expected to be ~ 0.5. For singlet combinations, e.g. the sea and 
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gluon, S is expected to be ~ 0. The values extracted may vary significantly from these expectations, 
and are rather dependent on the value of Qq taken, particularly for the gluon distribution. 

Much of the structure function data is obtained from scattering off a deuterium target, so in practice 
one also needs to define the parton distributions for the neutron. In the default fits this is always done 
by assuming charge symmetry, i.e. a transformation from p — )■ n corresponds to 

d{x) — )• u{x) u{x) — )■ d{x), (45) 

i.e. in this exact limit for a neutron target the LO expression equivalent to Eq. fl30|) is, 

-^2,neutron(a;, <5^) ~ x(^^{d{x) + d{x)) + + u{x)) + hcavy flavours) . (46) 

In practice not all the parameters in the inputs for the parton distributions are free. There are a 
number of sum rules constraining the parton inputs and which are maintained in the evolution equations 
order by order in as- There are the rules 

/ uv{x) dx = 2 / dv{x) dx = 1 (47) 
Jo Jo 

i.e. the conservation of the number of valence quarks. There is also the conservation of the momentum 
carried by the partons 

/ xTj(x) + xgix) dx = 1. (48) 
Jo 

This turns out to be an important constraint on the form of gluon distribution which is less directly 
and precisely constrained than the quarks. 

In determining the full sets of parton distributions we need to consider that not only are there at 
least 6 different combinations of partons, but there is also an extremely wide distribution of x both 
probed and needed which extends, in the former case from from x = 0.75 to x = 0.00003. Hence, in 
practice we need very many different types of experiment for a full and precise determination of all 
parton distributions. 



2.2 Large-x Quarks 

Let us consider how each type of parton distribution is determined by a fit to experimental data. We 
start with probably the most obvious example of the quark distributions at large x. In the simplest 
parton model the parton distribution is dominated by up and down valence quarks with x ~ 0.3. In 
detail the picture is much more complicated, but the up and down valence distributions for x > 0.1 are 
indeed a major constituent of the proton and, until we approach x = 1 are very precisely determined. 
For X > 0.1 the quark distributions are determined almost entirely by comparison to structure function 
data. In this region they are dominated by the non-singlet valence distributions, i.e. one is unlikely to 
find sea quarks or gluons as x — )• 1. The approximation of a non-singlet distribution leads to a simple 
evolution of the parton distributions and conversion to the structure functions 



d/^^(x,Q2 



P^^(x,a5(Q^))®r^(x,Q2) 



dlng2 

Fi''{x,Q') = C^'ix^asiQ'))® f'{x,Q\as{Q')) 

This means that the evolution of the high x structure functions is a good test of the theory of QCD 
and provides a direct measurement of 0:5 (Q^) which is the only parameter in Eq. (H9l) other than 
the parton distribution. However - this very clean picture is disturbed somewhat by the fact that 
perturbation theory involves contributions to coefficient functions ~ ag(Q2) ln2"-^(l - x). Related to 
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Figure 7: The description |12] of large x BCDMS and SLAG measurements of (left) and 
of large x NMC measurements of F^/Ff (right). 



this reduced convergence of perturbation theory is the fact that higher twist corrections of the form 
(^qcd/ Q"^) known to be enhanced as x — )■ 1. Hence, it is common to impose a cut of data and require 
= Q'^{l/x — 1) +mp to be greater than 10 — 15GeV^ to avoid contamination of perturbation theory 
(or alternatively to put in a parameterisation of higher twists to simultaneously fit these corrections [30j). 
This leads to the precision of the extraction of the quarks becoming more limited as x increases above 
about 0.6. 

There are various different types of structure function data which constrain high-x quarks. The 
most obvious is charged lepton proton scattering, for which the differential cross section is 



(fa 2t{o? 



[(1 + (1 - yy)F,{x, Q') - y'FUx, Q')] (49) 



where we ignore W and Z exchange (which is a small correction except for the highest-Q^ HERA data), 
and where y = Q'^/xs. Both Fl{x,Q^) and y are usually small so the cross section is effectively a 
measure of F2{x, Q"^). 

F^{x) ^ x[4/9(m + m + c + c) + l/9(c/ + J+s + s)] 

F^{x) ^ x[4/9(rf + J+c + c) + 1/9(m + m + s + s)] (50) 

This means that SLACp], BCDMS|44], NMC[45] and E665[l6] data on F|(a;,Q2) and Fl^i.x.Q'^)^ 
SHI SSI SS] and a dedicated measurement by NMC of of F^ix, Q^)/^^^, Q^)|l9] help determine high x 
parton distributions dominated by valence quarks. The fall of the structure functions, and implicitly 
parton distributions, at high x is shown in the left of Fig. [7] for SLAC and BCDMS ^^(x, Q^) data 
where high-x partons evolve through splitting to smaller x partons. The NMC data translated into the 
form F^(x, Q^)/F2 (x, Q^) is shown in the right of Fig. [7]and compared to two PDF sets which postdate 
and predate this data. One sees that the ratio falls as x approaches 1, leading to the clear conclusion 
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Figure 8: Large and smaller x measurements of (left) and F^'^ (right) in neutrino 
scattering [52]. 



that d{x, Q"^) falls more quickly than m(x, Q^) in this limit. However, the behaviour as x reaches 1 is 
not determined. 

Complementary information can be obtained by fitting to charged- current (neutrino) DIS data. This 
is obtained by the CCFR|50j, NuTeVl^SlJ and CHORUS [52j collaborations. In this case the differential 
cross section is 



dxdQ"^ 



oc F2^{x, Q 



2\ 



(1-1/)+ ^ 



2 



2{1 + R{x,Q^)) 



±xFr{x,Q')y{l-y/2), (51) 



where R = Fi/{F2 — F^) and F^ appears due to parity violation. For the proton at LO 

F!^ = 2x[d+s + u + c] 
F^ = 2x[u + c + d + s] 
xF^ = 2x[d + S-U-C] 

xFl = 2x[u + c-d-s]. (52) 

Therefore 

F^' + F^ = 2xJ2{q + q) = 2xJ: 

i 

F^ + F^ = 2iuv + dv), 

assuming a = s and c = c in the latter. In fact, in order to maximise the cross section the CCFR and 
NuTeV measurements are made using an iron target and the CHORUS measurements a lead target. 
Both be corrected to an iso-scalar target, i.e. F^ = h^F^ + F"), so using the charge symmetry 
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Figure 9: HERA charged current cross sections for (left) and e (right) [29] 



relationship in Eq. we obtain. 



xK 



x{uv + dv) 
xiuv + dv) 



2x[s — c] 
2x[s — c] 



(53) 



The results of the measurements by CCFR and CHORUS are shown in Fig. [H]for both {x, Q ) and 
F^{x, Q^). At high x both structure functions are direct tests of the total valence quark distribution and 
are very similar to each other. At lower x they begin to differ due to much more sea quark contribution 
toFf(a;,Q2). 

As with the charged lepton scattering there are issues which make the relationship of the structure 
functions to the quark distributions less direct. Again there are higher twist contributions, and these 
may be more significant for F3[x, Q^) than for F2{x, Q^)^3\^^ (the latter being protected by the Adler 
sum rule, i.e. Jq F^'^{x,Q'^) dx = 0). There is also the issue that parton distributions in nucleons in 
nuclei are not expected to be exactly the same as for free nucleons. This means that global fits generally 
use use nuclear corrections determined by fits to exclusively nuclear targets (this is not done for the 
default set in[2B])- This is a subject worthy of review in its own right. But examples of corrections can 
be found in [551 EH [57]. Comparison between theory and data is good, and leads additional information 
to help in the determination of the valence quarks at high x. (In principle nuclear corrections should 
also be applied in fits to deuterium data. However, it is assumed these are very small and global fits 
largely ignore them.) 

There is also HERA charged current-data at high Q^[58[ 15^ EHl [SI] which provides information on 
valence quarks and flavour decomposition. In principle it is superior to both the lower-energy fixed 
target neutral current data and the charged current data from nuclear targets since it is essentially 
free from higher twist corrections and is completely free from nuclear target corrections. However, the 
data analysed and published so far have low statistics, even when combined [29]. These data are shown 
compared to the fit in [21] in Fig. They do not currently provide comparable constraint to the 
fixed target data even taking into account the theoretical uncertainties inherent in fitting to the latter. 
However, the precision of the HERA data is likely to improve significantly once the full run II data has 
been fully analysed. 
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Figure 10: The process of Drell-Yan annihilation to produce lepton pairs. 



2.3 Antiquarks at Large and Moderate x 

As X decreases the sea quarks become more important. To a certain extent these are constrained by the 
difference between the valence quarks extracted from the fits to F^{x, Q"^) in charged current scattering 
and to F2{x,Q'^) in all measurements. However, this is rather indirect, and the fit to F^lxyQ"^) tends 
to become more sensitive to nuclear corrections at smaller x, i..e. this correction is larger and more 
uncertain in this region. A more direct determination, which also probes the sea quarks in regions of 
X ^ 0.2 where they are very small, comes from Drell-Yan scattering. The process is the production of 
lepton pairs from quark-antiquark annihilation in proton-proton scattering, shown in Fig. [TOl 

This is measured in fixed target experiments E605[62j, E772[63j and E866^64j. In the first the target 
is copper, so nuclear target corrections are required, so this data is not always used. The second seems 
to have incompatibility issues with other data. The E866 data from pp collisions is often used as seen in 
Fig. [m That from pd collisions seems incompatible with structure function and other measurements, 
so is often neglected. For these fixed target experiments the kinematic variables are Feynman x, i.e. xp 
and T = M'^/s, where is the invariant mass of the dimuon pair. At LO these are related the the 
momentum fractions of the partons of the hadrons by xp = Xi — X2 and r = X1X2 = M'^/s. At LO the 
differential cross section is 



da 



dM^dxp 



oc 



E 



eq{q{xi)q{x2) + q{x2)q{xi)). 



(54) 



The fixed target measurements cover 4.5GeV < M < 14GeV and 0.02 < xp < 0.75. 

Assuming the total quark distributions are already well-known from structure function data this 
provides a probe of u and d in the proton for moderate 0.02 < X2 < 0.3. For x > 0.1 we find that very 



roughly q{x) 



x) 



much softer than the valence quarks, as expected. The NNLO correction for 



the total Drell Yan cross section has been known for many years [65j- More recently the fully differential 
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Figure 12: NMC — F2 data points compared to the a description using PDFSjl2]. 



Drell Yan cross-sections at NNLO were completed [66| Wl\ |68| 169] , so all data of this form can be included 
properly within an NNLO extraction of parton distributions. 

The Drell- Yan cross section gives information on the sum of the u and d distributions, with the 
former having the larger charge weighting in the cross section, but we can also consider the precise 
difference between u and d. Some of this can be found from structure function measurements. The 
difference is related to the Gottfried sum rule, which at LO gives 

Igs= —{F^^-rn = 7t/ dxiuv-dv + u-d) 
JO X 6 Jo 

1 1 /•! _ - 

= — I — / dx{u — d). (55) 



3 3 Jo 

The left-hand side of Eq. ([55]) was measured by NMC in the region 0.004 - 0.8[70| at = AGeV^ and 
was determined to be 0.258 ±0.017 which implies / dx{d — u) ~ 0.2. This is shown in Fig. [T21 and relies 
on an extrapolation to high and particularly low x, which actually provides most of the uncertainty. 
Nevertheless, the evidence for dx{u — d) 7^ is very strong. 

Information on the d — u difference is more directly available from Drell- Yan asymmetry 

Any = "-^^i^ = i^, (56) 

where 

Auid2 + diU2 + 4:Uid2 + diU2 
~ ^ (5i ) 

AU1U2 + did2 + AU1U2 + did2 ' 
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Figure 13: The Drell-Yan asymmetry predicted by various parton sets compared to E866 
data (left) and the d/u ratio compared to the data requirements j72j . 




Figure 14: The rapidity distribution of the Z boson [74J produced a the CDF experiment 
compared with a prediction from using CTEQ PDFs. 



and 1 labels the proton and 2 the neutron. In fact it is the quantity 

R^P = ^ = l{^ + r), (58) 

which is measured, which contains the same information. 

There was originally one point for the Drell-Yan ratio measured by the NA51 experimental] at 
Xi = X2 = 0.18 which implied that d ~ 2'u at this x value. This was greatly improved by measurements 
from the E866/NuSea experiment |72j which made very accurate measurements from 0.04 < x < 0.3. 
This gives clear evidence of u — d asymmetry, as seen in Fig. [T3]but not as much as suggested originally 
by the NA51 point. The asymmetry seems to reach a maximum at a; ~ 0.2. It is not currently clear 
what happens as x — )■ 1. The asymmetry is becoming small at the smaller x values, so the assumption 
that as a nonsinglet quantity d — u will have the same general shape as valence distributions implies it 
tends quickly to zero. This is what happens with the vast majority of parameterisations of this quantity 
in the parton distribution sets. However, there is no sum rule constraint, so it is not impossible that 
the asymmetry becomes large, in either direction at small x. 

In the past couple of years supplementary information from an analysis of the Z-boson rapidity data 
from from the D0|L73j and CDF [71] experiments at the Tevatron collider has become available. This is 
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dominated by a narrow range of invariant mass near to mz = 91.1GeV and is also a function of the 
rapidity y where 

y = \n{{E + p,)/{E-p,)). (59) 

At LO in the parton kinematics Xi^2 = 2;oexp(±y), where Xq = mz/^/s, so since ^/s = 1.96TeV 
Xq = 0.05 and corresponds to central rapidity. Over the full range of rapidity values of x from 0.003 
up to 0.7 are probed, reaching smaller values of x than the fixed target data. Since the Tevatron is a 
proton-antiproton collider at LO the differential cross section is given by 

^^2^y ^ Y.i'^l + 0'l)i'lixiMx2) + g(x2)g(xi)), (60) 

where % and a^^ are the vector and axial couplings respectively. As such it is largely sensitive to the 
larger quark distributions, rather than the antiquarks, particularly at high rapidity. However, the fit to 
this data does rely on different flavour combinations from the previously discussed processes, so adds 
extra constraints, particularly for down type quarks and antiquarks due to the higher weighting in Eq. 
( l60|l compared to the electric charge weighting for many other processes. A comparison of a prediction 
at NNLO to the DO data is shown in Fig. [TH Clearly the comparison is good, but the accurate data 
does add extra constraints. 



2.4 The Strange Quark Distribution 

It is now possible to find the strange quark distribution directly rather than simply making it an 
appropriate fraction of the light sea. This is done by comparing to unlike sign dimuon production at 
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Figure 16: Ratio of strange quarks to light sea with uncertainty [26] (left) and the strange- 
antistrange asymmetry [2S] from NNPDF2.0 and MSTW2008 (right). 



CCFR and NuTeV, i.e. 
followed by 
or for antineutrinos 
followed by 



z/^ ^ /i- + W+ (61) 

iy+ + s ^ c ^ L>+ ^ (62) 

z/^ ^ /i+ + W- (63) 

W- + s D- ^ fi-. (64) 



In global fits it was previously assumed that at Qq we have s{x) = k0.5{u + d). k was determined 
qualitatively by comparison to a mo del- dependent extraction of the strange quark from CCFR dimuon 
data|75]. Using Ql = IGeV^ and k ~ 0.4 worked well, i.e. strange was about 18% of the input sea 
at input. Since all quarks evolve equally this fraction increases as increases. However, we can 
now do better and obtain a more precise normalisation and also shape for the strange distribution by 
comparing to the more detailed dimuon data obtained from NuTeV[76]. which also provides a more 
thorough analysis of the CCFR data presented in [75]. A comparison to the fit to both neutrino and 
antineutrino data from NuTeV is shown in Fig. [13 generally one finds a reduced ratio of strange to 
non-strange sea compared to the previous results. There is also some additional suppression at large x 
i.e. lower W"^, as seen in the left of Fig. [161 This is what one would expect of the suppression compared 
to the light sea is the effect of non-zero strange quark mass m^. 

The data constrains the strange quark in the x = range 0.01 < a; < 0.2. From the figure one might 
possibly imply that the neutrino data is sightly higher than the antineutrino data. This implies that 
s{x) 7^ s(x), though the lack of strangeness of the proton requires that 

1 

s{x) — s{x)) dx = (65) 





so an excess of s{x) over s{x) in one x range must be balanced by a deficit elsewhere. There have been 
numerous analyses to determine if this is indeed the case [771 ESI EHl EO], with most finding evidence, if 
only at about 68% confidence level, for a positive momentum asymmetry 

1 

x{s{x) — s{x)) dx. (66) 
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CDF data on lepton charge asymmetry from W->ev decays 



= 0.4 p 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 



L 35 < Et < 45 GeV, > 25 GeV, 50 < < 100 GeV 






z_ • CDF Run II (11 points) 

L MSTWZ00eNLOPDFfit,z= = 20 

I_ Same but no antiquarks 

^ MSTWZ00eNNLOPDFfit,x' = 19 

Z_ CTE06.6NLO,x= = 16 

=. . . 1 , , , 1 . . . 1 , , , 1 . . . 1 . . . 1 . . . 1 . . . 1 , , , 1 . . . 1 . . . 1 . 





0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 



Figure 17: Comparison [26] of fits to CDF data[8l] with various parton sets. 



Only NNPDF2.0[28] and MSTW2008[2g make the asymmetry part of their default sets. These are 
shown in the right of Fig. [161 and both give a positive momentum asymmetry, but are different at the 
highest X (where the data constraint vanishes). This positive momentum asymmetry acts to reduce the 
anomaly in the NuTeV measurement of sin^ 9w in neutrino DIS, which is in 3 cr disagreement with the 
world average [STl [82] . 

2.5 More Quark Constraints 

The final independent experimental constraint on the light quarks at moderate to large x comes from 
W asymmetry [83], or usually lepton asymmetry [84l [85] at the Tevatron pp collider. At LO 

da{W+)/dy - da{W-)/dy 



^^^^^ da{W+)/dy + da{W-)/dy 
u{xi)d{x2) - d{xi)u{x2) 

(67) 



u{xi)d{x2) + d{xi)u{x2y 

where Xi^2 = a^oexp(±|/), xq = Since u{x) > d{x) at large x, whereas they become roughly 

equal at smaller x, A\Y{y) is positive for xi > Xq = 0.05 {y > 1) (and at a proton-antiproton collider 
the asymmetry is expected to be exactly antisymmetric about y = 0, so results in each half of the 
detector can be combined). This helps pin down the u and d quarks in the region x ~ 0.1 as well as 
giving compatible information to NMC and CCFR/NuTeV at higher x, and thus contributes to the 
determination of the two valence quark distributions without any complications due to higher twists or 
deuterium corrections. 

In practice it is usually the final state leptons that are detected, so it is really the lepton asymmetry 

. _ da{l+)/dyi - d(j{l-)/dyi 
^^'^ da{l+)/dyi + da{l-)/dyi ^ ^ 
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Figure 18: LO diagram for prompt photon production (left), and comparison of the predic- 
tion to DO data on direct photon production [93] (right). 



which is measured where yi is the rapidity of the charged lepton. Defining the angle of the lepton 
relative to the proton beam in the W rest frame by cos^ 6* = 1 — 4E^/m^^ leads to 



This means the valence-quark-only approximation to the lepton asymmetry, can be inaccurate par- 
ticularly near the edges of phase space, i.e. cos 6** ^ ±1, since sea-quark contributions can become 
significant. Neglecting overall factors, the numerator of Eq. (l68l) can be approximated by (s, c and b 
quark contributions are very small at Tevatron energies) 

u(a;i)d(x2)(l-cosr)2 + (i(a;i)u(2;2)(l + cosr)^-d(a;i)u(a;2)(l + cosr)2-M(xi)J(a;2)(l-cosr)^ (70) 

and for cos^* ~ 1 the leading (du) sea-sea contribution is enhanced relative to the valence- valence 
contribution by the large (1 + cos^*)^ term arising from the V + A decay to leptons. The fit to the 
data in[84j is shown in two Et bins in Fig. [T71 including the consequence of ignoring the antiquark 
contributions. In working back to the VT- asymmetry in|83] information on parton distributions must 
be used to infer the likelihood of a lepton having come from either quark-quark or antiquark-antiquark 
annihilation. This implicitly loses some of the constraining power on the parton distributions, but 
makes the comparison between the data and the parton distributions more transparent. There is newer 
lepton asymmetry data from D0[86J, but this has not yet been used in global fits (see however [STf 188]). 
partially because there is a question-mark about how good a fit is possible when all other data is 
included. 

2.6 The Gluon Distribution 

The above measurements constrain the high and moderate x quarks to a few percent or better. It is far 
more difficult to obtain precise information on the form of the high x gluon. In the early days of global 
fits to parton distributions groups [121 [Sn] determined the gluon at high x via prompt or direct photon 
production [90l [91], via the process shown in the left of Fig. [181 In principle this is a direct test of the 
large x gluon - xt = 1'PTl\fs- However, (Pa / dEdpx is sensitive to nonperturbative information about 
the intrinsic kr of the gluons in the proton, to resummation of threshold logarithms, i.e. ln(l — Xt)[92], 





(69) 
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Inclusive jet cross sections with MSTW 2008 NLO PDFs 




Figure 19: Some LO diagrams for jet production (left). The jet cross section fraction from 
different contributions as a function of pr[T20] (right). 



and to the interplay between the two. Also, some experiments probing similar regions of parameter 
space give results which are difficult to reconcile without incorporating large changes in the corrections 
to the calculations outlined above for only small changes in energy. Hence, due to intrinsic uncertainties 
this process gives only a very rough indication of the gluon distribution. In order to be largely free from 
the theoretical ambiguities the data should be at pt such that an uncertainty in px of about IGeV is 
not too important, i.e. at least a few tens of GeV. There is more recent data from the Tevatron|93j. 
which satisfies this. This is shown compared to a prediction in the right of Fig. [181 There is good 
agreement, but the data is not precise enough to add any useful constraint. There is no obstacle to very 
precise direct photon production data from the Tevatron or LHC being used to constrain the gluon in 
the future [91]. 

The current best direct determination of the high-x gluon distribution is given by inclusive jet mea- 
surements by DO and CDF at the Tevatron, where they measure da/dpTdy and pt is the transverse 
momentum of the jet. For run I [951 [96] for DO and for run II[97l [98], [99] for both experiments the mea- 
surements are in different bins of rapidity. This gives better coverage of x since non-zero rapidity leads 
to asymmetric x values for the incoming partons. At central rapidity xt = '2pT / y/s the measurements 
extend up to px ~ SOOGeV, i.e. xt ~ 0.5, and down to pt ~ 50GeV, i.e. xt ~ 0.05. 

At matrix element level, where some LO diagrams are shown in the left of Fig. [T^lgluon-gluon fusion 
dominates. However, the gluon distribution falls off more quickly as x — > 1 than quark distributions so 
there is a transition from gluon-gluon fusion at small Xt, to gluon-quark at intermediate Xt to quark- 
quark at high Xt- However, even at the highest xt probed at the Tevatron, or likely to be probed at the 
LHC, gluon-quark contributions are significant, as shown in the right of Fig. [191 This qualitative picture 
is not altered beyond LO, but the calculation of the cross section becomes much more complicated. It is 
aided immensely by the implementation of fastnlq |100] . based on nlojetH — \- |102[I101] . which allows 
the inclusion of the exact NLO hard cross section corrections to jet data in the fitting routine. (We 
note that in |28] a similar numerical procedure is applied for Drell Yan-type processes, while the more 
general procedure is to use an x-dependent higher order /^-factor. So long as the latter is tuned to the 
parton distributions being used, as is usually the case, this should not lead to any noticeable inaccuracy.) 
NNLO jet cross-sections are not known in full, but some threshold corrections are calculated and can 
be applied flO 3] . and are used in fastnlo. The LO — NLO cross section corrections are not very 
large, in general ~ 20%, and the NNLO estimates are 5 — 10% and in both cases the corrections are 
smooth functions. This implies that missing NNLO corrections are similar in magnitude to correlated 
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Figure 20: Comparison of theory to data for the most recent DO jet data[98j, as absolute 
values (left) and as a ratio of data to theory [26] (right) . 



uncertainties on the data. 

For the run I data it was initially difficult to achieve a good fit at highest E^, with theory under- 
shooting data, due to indirect constraints on the gluon from the sum rule or the limited sensitivity of 
the high-x fixed target data to the gluon distribution. This was not a major problem and was solved 
by including additional flexibility in the gluon parameterisation and by a proper treatment of the large 
correlated systematic uncertainties on the jet data |104[ I105[ I106[ 1107] . though in many cases some ten- 
sion between the fit to jet data and the best global fit was evident. The run II Tevatron jet data is 
markedly softer at high pt than the run I data and consistency between the two is at best argued to 
be marginal |108] . and at worst poor [SB]. This makes the run II data easier to fit in comparison to the 
other data. An illustration of the fit quality is shown in Fig. [201 

In principle there is also a direct constraint on the gluon distribution from jet data in deep inelastic 
scattering at HERA[T09| IllOj IllH I112[ 1113] . primarily in the range 0.01 < x < 0.1. However, within 
the context of a full global fit this does add much extra information |26j, actually being more important 
in the constraint on as within a fit to parton distributions |T20]. When added to a fit containing only 
HERA structure function data it does have a significant impact [TT^ . partially because it does constrain 
the strong coupling better than the structure function data alone. This impact is shown in Fig. [211 
but the maximum effect is obtained by including data from photo-production as well as deep inelastic 
scattering, which includes the additional complication of having to use fits for the parton distributions 
of the photon. The NLO corrections are larger for HERA jet data than for Tevatron data, and there is 
no approximation at NNLO. It is therefore difficult to include these data in a NNLO fit. 

2.7 Small-x Parton Distributions 

All the above data constrain the partons mainly for x > 0.01 (though Tevatron Z-rapidity data extend 
a little further, though with diminishing statistical precision). The extension to the region of very low 
X has been made in the past decade by HERA pTSl 11161 11171 11181 1119] - This region is very interesting 
for the study of QCD. It is also vital for the LHC as seen from the kinematic range as illustrated in 
Fig. [22l At the smallest x values an extrapolation from the measurements at low scales to the LHC 
regions at higher scales is required. 
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In this region there is dramatic scahng violation of the partons from the evolution equations and 
also a complex interplay between the quarks and gluons. The evolution equations for the singlet sector 
are coupled 



Pgg ® S + Pgg ® g 

Pgg®i:+ Pgg ® 9 (71) 



dg 



rflnQ2 

At very small x and at LO the splitting functions tend to an effective limit 

P",^— ^^^-^^V^l^id-x), (72) 

and so the gluon grows very quickly with increasing while the quark distribution also grows quickly, 
very largely driven by the gluon distribution. This means there is a fairly direct correlation between 
xg{x,Q'^) and dF2{x,Q'^) / dlnQ"^ at this order. At NLO the splitting functions as x — )■ become 

1 oil 1-6 

PL -0. 7a| - PL -^2Nf-A — . 73 

Hence, the gluon evolution is only slightly modified, whereas the quark evolution is greatly enhanced at 
NLO. At this order dF2{x,Q'^) / dhiQ'^ is no longer directly proportional to xg{x,Q'^), but is sensitive 
to the gluon at all higher values of x. The very precise data has already been illustrated in the left of 
Fig. |5]compared to the HERAPDFl.O fit. This data is a direct constraint on the charge weighted quark 
distributions down to x = 5 x 10~^ and an indirect, but still very precise constraint on the gluon. This 
is only for x values of about an order of magnitude higher, however. This is partly because the accurate 
constraint on the gluon from evolution, i.e. from dF2{x,Q'^)/dlnQ'^, ceases when there is no longer a 
fairly large number of points at a given x with different values. It is also because the evolution 
involves a convolution of the gluon so probes slightly higher x values than the data, particularly at 
NLO compared to LO. 

At NNLO the small x splitting functions become, as a; — )■ 0[39] 

, „ slnfl/x) . a|l.41n(l/x) 

So at NNLO the quark evolution at the smallest x values is enhanced yet again while the gluon evolution 
is suppressed. It is known that at each subsequent order in as each splitting function and coefficient 
function obtains an extra power of ln(l/3:) [121l I122[ 1123] (there are some accidental zeros in Pgg), i.e. 

P,,(x,«5(Q')), Cfix,asiQ')) a^iQ')\n'^-\l/x). (75) 

and hence the convergence at small x is questionable. The global fits usually assume that this turns out 
to be unimportant in practice, and proceed regardless. The fit is quite good, but could be improved as 
illustrated in Fig. [23 As seen in this figure the evolution becomes slightly steeper with in general 
as the order increases, due to the extra contributions to the splitting functions discussed above, and 
the fit becomes slightly better. Further implications for the small-x behaviour will be discussed later. 

Very recently the HI and ZEUS collaborations at HERA have combined their structure functions 
measurements into a single result [29] . This is shown in Fig. [211 This not only reduces the statistical 
uncertainty, but has a far more dramatic effect on the correlated systematic uncertainties, which are 
frequently far better understood for a given source by one collaboration, and can hence be reduced by 
considerably more than the statistical error. This change in the correlated errors also means the average 
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Figure 24: An illustration of the combination of HERA data[29]. The open circles and 
squares are the separate HI and ZEUS data respectively and the closed points the combined 
data. 
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Figure 25: Q"^) data from HI compared to various predictions [1261 1127] . 



of two data points from the two collaborations is not the simply the weighted average of the central 
values of each. In particular, a better understanding of normalisations has moved the data upwards 
slightly overall. This is reflected in slightly larger quark distributions in most small-x regions, most 
particularly near x = 0.01. The data also leads to a reduction in uncertainty in parton distributions, 
but this depends on the other data used in the fit and the procedure used. 

There is also now a direct HERA measurement of Fl{x,Q'^) which is a new and direct constraint 
on the small-x gluon. The original published data |124] 1125] is not yet precise enough to add much 
real extra constraint, but is a vital consistency check for the extracted gluon distribution. Existing 
parton distributions match this data well. However, the preliminary lower-Q^ data from HI (see e.g. 
[1261 1127j ) shown in Fig. [25] seem to be in excess of the predictions from the majority of fixed-order 
parton distribution sets, and imply some additional physics at small x and/or low Q^.The data have 
very recently been finalised [128] . and change slightly, but the same general conclusion holds. This will 
be discussed briefly later. 



2.8 Heavy Flavours - Quark Masses 

There is also data from HERA on the heavy flavour contribution to structure functions, i.e. on 
F|(x,Q2)ji29l [1301 iml [133 1133 [TMIIM and also including F!^{x , Q'^)^^^\. One might consider 
this to be a constraint on the heavy flavour distributions. However, these are generated entirely from 
evolution from the light partons, mainly the gluon. The gluon is constrained by the data in the previous 
subsections, so the heavy flavour structure functions are very largely a prediction. They are, however, a 
test of the quark masses, which set the boundary conditions for the heavy flavour evolution, and hence 
the size of the heavy quark PDFs, and of the theoretical procedure used to calculate heavy flavour 
structure functions. These will be discussed in more detail later. A comparison of theory to some of 
the most precise data is shown in Fig. [261 
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Figure 26: A comparison of the calculation of charm F^ix-, Q"^) (left) and -^2^(0;, Q"^) (right) 
using a variety of prescriptions and and recent HI data |136j . 



3 The Variety of Parton Distributions 

3.1 Different PDF Sets 

As outlined earlier there are a number of different groups which obtain full sets of parton distributions 
by fitting to structure function and other data. Some of the differences were touched upon in the 
previous section when discussing the data sets which can be included in a fit. In this section we will 
present the range of sets, and their similarities and differences rather more comprehensively. We choose 
not to dwell a great deal on the history of each, since in all cases the most up-to-date set is clearly the 
one which should be used, and in most cases there is a point before which there is a very good reason 
to no longer use the sets in a given series. 

The different sets and their most basic features are listed below. 

• The MSTW group is on of two which has been producing parton distributions for many years from 
global fits to a wide variety of data. The group changed from MRST to MSTW in 2007, though 
the first set produced by the group [138] maintained the MRST nomenclature. The most recent 
set is MSTW2008[2S]. The group fits to essentially all the data sets listed in the previous section, 
including the up-to-date Tevatron jet data and W and Z data. The fit does not include the most 
recent HERA combination of structure function data (the effects have been investigated |139] ) or 
the HERA data on Fl(x, Q^), though it does include some fixed target data on Fi{x, Q^) [HI SSI 
H3] . MSTW are the only group to include HERA jet data. The group produces PDFs at LO, 
NLO and NNLO. 

• The CTEQ group is the other which has been performing global fits for may years, and is in 
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many ways has an approach which is very similar to that of MSTW. Again the group fits to the 
vast majority of available data. The recent significant update in widest use is CTEQ6.6[27J. This 
is slightly older than the MSTW2008 sets and is not quite as up-to-date on Tevatron data and 
likewise does not include the most recent HERA combination of structure function data, though 
an updated set, CT10[87j has appeared recently and includes these data sets. PDFs are made 
available at NLO. 

• The NNPDF group uses a rather distinct procedure, as will be explained later in this section. It 
has continually been developing for the past few years, but with NNPDF2.0[2S] (and extremely 
recently NNPDF2.1 [140] ). which includes Tevatron data, they have reached the status of a global 
fit. Previous sets are based on rather smaller data sets, either mainly or entirely structure functions 
data. The NNPDF2.0 fit includes all the data discussed above except HERA jet data and heavy 
flavour structure functions. It is sufficiently recent that it does include the HERA combined data, 
and notices a moderate effect compared to the original individual data sets, most noticeably a 
smaller uncertainty in the gluon and singlet quarks below x = 5 x 10"'^. PDFs are made available 
at NLO. 

• There have been a variety of fits performed by the HI [59] and ZEUS fTiTl 1114] collaborations. 
These have sometimes included fixed target structure function data |141[ l5^ or HERA jet data |1 14] . 
More recently, in order to analyse the combined HERA structure function data the fitting groups 
have also converged. The HERAPDFl.OpSj PDFs are based entirely on HERA inclusive struc- 
ture function data, both neutral and charged current. PDFs are produced at NLO. A preliminary 
update, including NNLO results, is in |142] . 

• The ABKM group provides a continuation of the fits performed in [1431 11441 1145] . The first set of 
PDFs obtained by the combined group ABKM09[30] comes from a fit to structure function, fixed 
target Drell-Yan, and dimuon data. No Tevatron data is included (though preliminary results for 
some jet data sets can be seen in |146] ). PDFs are produced with both NLO and NNLO evolution. 
There is a preliminary update in |147] . A parameterisation of (^(l/Q^) power corrections is 
employed at low W"^ rather than the more common kinematic cut. 

• The GJR[3I], or dynamical parton distributions are based on the idea, originally advocated in 
|148] . that the PDFs are generated from a valence-like input form at some very low starting scale 
Qq < 0.5GeV^. They are obtained from a fit to structure function, fixed target Drell-Yan and 
Tevatron jet data. Sets from more conventional starting distributions are also obtained, though 
not advocated by the authors (despite providing much better fit quality and a more conventional 
value of the strong coupling). PDFs are made available with LO, NLO and NNLO [132] evolution. 
At NLO, PDFs are also made available in the DIS factorisation scheme as well as the MS scheme 
used by all groups. This has been done in the past by some others, e.g. [SH 1150] . In practice 
the transformation rule in [25J (and defined up to NNLO in |151] ) could be used to make the 
transformation between the two. 

The evolution of the partons in each set should be exactly the same up to small uncertainties. A cross- 
check of PDF evolution was first performed in |152] . leading to the correction of bugs causing errors up to 
a couple of percent. Now there is much better agreement. Benchmark tables have been constructed [TS3] 
using publicly available codes [TM] 1155] . and checked against some sets (see e.g. |156t [25]) and agreement 
to much better than 1% is found. However, the PDF sets obtained differ by much more than this. As 
outlined above this can be due to the choices of data sets and kinematic cuts made, but can also occur 
for a large number of other reasons. This will be discussed in detail in the remainder of this section. 
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Figure 27: The best value of (7\y and the uncertainty using a fit deterioration of Ax^ = 1 
for each data set in the CTEQ fit (left) and the 90% confidence limits for each data set as 
a function of \/Ax'^ for one eigenvector (right) [T57] . 



3.2 Parton Fitting and Uncertainties 

There are two main approaches to obtaining the most likely PDF sets and the uncertainty. One, the 
original approach, is to find the PDFs by the best fit to the data, and then to perturb about this in 
some fashion. The other is to obtain an ensemble of different PDF sets and to find the most likely 
result by averaging and the uncertainty from the deviation of the predictions from the mean. This will 
be discussed in more detail below. 

In the former case the quality of the fit is traditionally determined by the of the fit to data, 
which may be calculated in various ways. The simplest is to add statistical and systematic errors 
in quadrature, which ignores correlations between data points, but is often quite effective. Also, the 
information on the data means that sometimes only this method is available. Being more complete one 
uses the full covariance matrix which is constructed as 

n N N 

a, = 5,,al,,, + Y: P^,cT,,a,„ ^^ = Y,Y.iD,- Ua))Cr^{D, - T,{a)), (76) 

k=l i=l j=l 

where k runs over each source of correlated systematic error, p'^j are the correlation coefficients, is the 
number of data points, Di is the measurement and Tj(a) is the theoretical prediction depending on parton 
input parameters a. An alternative that produces identical results in the quadratic approximation is to 
incorporate the correlated errors into the theory prediction 

Ma, s) = Ua) + ± s,A,,, x' = ^2 f^LzM^) \±sl (77) 

k=l i=l ^ ^i,unc / ^,=1 

where Aik is the one-sigma correlated error for point i. One can solve analytically for the gt. ^l57j . 

Having defined the fit quality there are a number of different approaches for obtaining parton 
uncertainties. The most common is the Hessian (Error Matrix) approach. One defines the Hessian 
matrix by 

X' - xLn = = E H.M^ - - «f )• (78) 

One can then use the standard formula for linear error propagation: 

(AF)-Ax^i:|m-|^. (79) 

This was used to find partons with uncertainties in e.g. |115[ 1143] . In practice this can be problematic 
due to extreme variations in Ax^ in different directions in parameter space. This can be improved by 
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Figure 28: The tolerance as a function of eigenvector number for the MSTW2008 NLO 
PDFs|26]. The outer band is 90% confidence level and the inner band 68%. The label at 
each end of the bar is the data set providing the main constraint to that eigenvector in each 
direction. 



finding and rescaling the eigenvectors of H, a method developed by CTEQ [T58l I159j . and now used 
by other groups. In terms of the rescaled eigenvectors Zi, which are orthonormal combinations of the 
ttj — ai{0), the increase in is given simply by 

x'-xLn = Ax' = E^'' (80) 

i 

i.e. constant Ax^ is the surface of a hypersphere in the space of the parton parameter eigenvectors. 
The uncertainty on a physical quantity is then obtained using Pythagoras' theorem, 

{^Fr = lj:iFiSl^^)-FiSt^))"^ (81) 

where 5'!'''^ and S'f"'' are PDF sets displaced along eigenvector directions by the given Ax^- 

One can also investigate the uncertainty on a given physical quantity using the Lagrange Multiplier 
method, first suggested by CTEQ |T57| and also investigated in some detail by MRST fTBU] . One performs 
the global fit while constraining the value of some physical quantity, i.e. minimise 

^{X,a) = xlohaM) + XF{a) (82) 

for various values of A. This gives the set of best fits for particular values of the parameter F{a) without 
relying on the quadratic approximation for Ax^, but has to be done anew for each quantity. 

In each approach there is uncertainty in choosing the "correct" Ax^. In principle this should be 
one unit, and some groups with smaller number of data sets use this. However, given the complications 
within a full global fit this gives unrealistically small uncertainties. This can be seen in the left of Fig. [271 
where the variation in the predictions for aw using Ax^ = 1 for each data set has an extremely wide 
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Figure 29: An illustration of the fit quality to the training (labelled Etr) and validation 
set (labelled E^ai) for NMC data in the NNPDF fitfSH]. The constant line labelled ^target 
determines the weight of this data set in the fit. 



scatter compared to the uncertainty. CTEQ choose Ax^ ~ 100. The 90% confidence limits for the fits 
to the larger individual data sets when a/Ax^ in the CTEQ fit is increased by a given amount are shown 
in the right of Fig. [271 As one sees, a couple of sets may be some way beyond their 90% confidence limit 
for Ax^ = 100. The MRST group instead chose A^^ = 50 to represent the 90% confidence limit for the 
fit |160] . However, the most recent fit [26] recognised that some eigenvectors are constrained by many 
fewer data points than others, and modified the prescription to give a so-called "dynamical tolerance" 
where the Ax^ depends on the eigenvector (and on the two orientations of the eigenvector). The values 
of y/Ax^ for the NLO MSTW fit are shown in Fig. [281 For 68% confidence level they are usually of 
magnitude 2-4, suggesting, on average, a Ax^ of about 10 for one-a uncertainties, somewhat smaller 
than previous MRST, and certainly CTEQ values. There has been recent work on how the increase in 
Ax^ may be related to inconsistency of data sets |161] and limitations of a fixed number of parameters 
for input parton distributions [162] . both in the context of the CTEQ global fit. The former implies that 
data set inconsistency should give Ax^ ~ 4 for one a, though this might be greater if more data sets or 
less conservative cuts are used than in the CTEQ fit. The latter implies a relatively similar factor for 
the increase in x^- In regions where there is little data constraint and PDFs are constrained by their 
limited parameterisation, e.g. very high x or very small x for valence quarks (and the gluon using some 
parameterisations) this can be due to quite large changes in PDFs making rather little difference to the 
fit quality. Where there is constraining data it seems more likely to be the case only if a noticeably 
better fit can be found by extending the parameterisation, an unsurprising result which then depends 
on how well the original parameterisation is performing. 

There are other approaches to finding the uncertainties. In the offset method the best fit is obtained 
by minimising the x^ using only uncorrelated errors. The systematic errors on the parton parameters 
are then determined by letting each Sk = ±1 and adding the deviations in quadrature. This method was 
used in some previous ZEUS fits |114] . and is used for the three "procedural" systematic uncertainties 
in the HERAPDFl.O fit [29]. 

There is also the statistical approach used by Neural Network group. Here one constructs a set of 
Monte Carlo replicas <J^{pi) of the original data set a'^"'*"'{pi) which gives a representation of P[a{pi)] at 
points Pi. Then one obtains a parton distribution function for each replica, obtaining a representation 

(k) 

of the PDFs ql . The set of PDF replicas obtained is a representation of the probability density - i.e. 
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the mean no and deviation ao of an observable O is given by 



-1 ^^rep -1 ^^rep 

/^o = ^ E = TT^ E - /^o)^. (83) 

One can incorporate full information about measurements and their error correlations in the distribution 
of a'^^°'{pi). This is does not rely on the approximation of linear propagation of errors (though the data 
replicas assume a Gaussian distribution) but is more time intensive. This basic idea was proposed in 
[163[ 1164] , but was performed using standard input parameterisations for PDFs. 

The NNPDF group [THSl [ISSl EB] has developed this philosophy and combined it with the standard 
input parameterisations being replaced by a neural network, or effectively a very much larger number 
of parameters than any other group (though there is some pre-processing which leads to the limits 
of a; — )■ and a; — )■ 1 being related to the usual forms of and (1 — x)^ respectively). In principle 
this means that if the fit to data is left to converge for too long a time the input will start to fit to 
fluctuations in the data. This is avoided by fitting to both a training set and comparing to a validation 
set, each comprising of half the data. The fit is then stopped when the quality of the fit to the training 
set may still be slowly improving, but that to the validation set starts to deteriorate. This is one of the 
main sources of complication, and where there has been continual development. Ideally each data set 
within the global fit will reach its stopping point at the same time. In practice some will tend do so 
long before others and their validation sets can be progressively fit worse while the quality of the fit to 
the global validation set is still improving. Stopping while the global validation sets is still improving 
is probably somewhat analogous to the requirement for the other "global" fits to use an infiated Ax^ 
when perturbing about the best global fit. The NNPDF group have adopted a procedure called "target 
weighted training" [28] to minimise this problem, where the different data sets have different weights, 
which are determined iteratively, and aid the convergence of each set reaching the appropriate stopping 
point at the same time. The quality of the fit for the training and validation sets is shown for one 
data set in Fig. [221 This new procedure leads to some data sets in [28] to have a better fit quality 
than previous NNPDF fits, particular fixed target structure function data. It also leads to generally 
smaller uncertainties, perhaps being nearer in some sense to the criterion Ax^ = 1 in the alternative 
procedure, than previous stopping criteria, as seen in Fig. 23 of [28]. The most recent set has also 
included changes in the treatment of data set normalisations [TB^ . so it is difficult to appreciate the 
change in PDFs between [2H|, and the previous set |167] due to the inclusion of new data by comparing 
the two sets. Helpfully, there are various illustrations in the effect of particular data sets in [28] . 

To summarise, the procedure used to determine the uncertainty for each group is: 

• MSTW08 perturb around the best fit using 20 orthonormal eigenvectors. Older data sets on 
structure functions use data averaged over different energies and combine uncertainties for struc- 
ture function data in quadrature, other than those on normalisation. It has been checked in 
[16Ut 1168^ 1169] that this has a small effect on the central values and uncertainties of the PDFs 
(in the last changes of up to (j/4 only were found), though it clearly affects the value of the y^. 
Due to incompatibility of different sets, imperfect theory, and (to some extent) parameterisation 
inflexibility MSTW have an inflated Ax^ of ~ 5-20 for one a uncertainty for the eigenvectors, the 
value being determined independently for each eigenvector and direction. Data set normalisation 
uncertainties are included in the determination of the best flt and uncertainties, though a quartic 
penalty is applied to these to minimise a drift to slightly low values, most notably in the LO flt 
where the theory is systematically low compared to data. 

• CTEQ6.6 perturb around the best flt using 22 orthonormal eigenvectors, using Ax^ of 100 for 90% 
confldence level for the eigenvectors. There is some unspecifled weighting of data sets in the global 
flt. Data normalisation uncertainties are not included in the determination of the uncertainties, 
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Figure 30: Uncertainty on the up quark (left) and down quark (right) from the CTEQ6.6 
and MSTW08 PDF setsl26j. 



which might be an additional reason for the large tolerance. CTIO uses 26 eigenvectors and an 
improved method of uncertainty determination, but still with a similar Ax^. 

• For NNPDF2.0 the uncertainty is determined by the deviation of either 100 or 1000 PDF replicas 
where all data uncertainty information has gone into generating the data and consequently PDF 
replicas. The "best fit" could be taken as the average of the replicas, but there is also one PDF 
replica which has been fit to the central value of all data points, as in the other procedures. The 
direct relationship to Ax^ in alternative global fits is not trivial. 

• HERAPDFl.O perturb about the best fit using 9 orthonormal eigenvectors. Most (110) systematic 
uncertainties are combined in quadrature with the statistical uncertainties. Since the data comes 
for one combined self-consistent set Ax^ = 1 is used to determine the uncertainties from this 
source. The three procedural (and largest) systematic uncertainties are added using the offset 
method. Hence, the uncertainty is "slightly" more conservative than a use of Ax^ = 1 incor- 
porating all uncertainties. Since the default number of input parameters is small an additional 
parameterisation uncertainty is included by adding various other parameters one at a time, and 
also by changing the starting scale for the evolution. Additional variations in strange sea fraction, 
data cuts and quark masses are included. 

• ABKM uses perhaps the most straightforward and conventional method of determining the PDF 
uncertainties. They perturb about the best fit using 21 parton parameters and also include heavy 
quark masses and the strong coupling as free parameters. They publish the correlation matrix of 
the fitted parameters. The strict criterion of Ax^ = 1 is used for uncertainty determination. 

• The GJR08 set is also based on a perturbation about the best fit. There are 20 parton parameters 
and the strong coupling is also varied when determining the uncertainty. They use Ax^ = 22 in 
order to define a one a uncertainty, and seemingly add statistical and systematic uncertainties in 
quadrature for all data sets, including Tevatron data. The fact that they impose a strong theory 
constraint on the input form of PDFs results in a reduced uncertainty in the small-x singlet 
distributions, particularly the gluon distribution. The error bands of their default "dynamical" 
PDFs do not always overlap with those in their "standard" determination, which uses a starting 
scale and parameterisation more similar to other groups. 

Perhaps surprisingly, despite the very widely differing procedures for uncertainty determination 
described above, all PDF sets obtain rather similar uncertainties for the PDFs and predicted cross- 
sections. In fact the agreement in this respect is probably better than might be expected given that 
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Figure 31: A comparison [30] of the ABKM09 PDFs with uncertainty bands to the MSTW08 
central fit (dashed hues). 
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Figure 32: A comparison [31] of the GJR08 PDFs, both the recommended "dynamical set" 
(djTiNLO) and the more conventional "standard set" (stdNLO) to CTEQ6. 



some sets contain considerably fewer data constrains than others. This later impression is reinforced 
by the fact that the central values of the PDF sets do actually show some significant deviations, which 
are greater than the individual uncertainties, as seen in Figs. [31] and [32] As an example of some of 
the best agreement in Fig. [30] we show the 90% confidence level uncertainty on the MSTW08 NLO u 
and d distributions, along with CTEQ6.6, where the central line for the later represents the ratio of the 
CTEQ PDF to that of MSTW. There is clearly reasonable agreement between the two groups. 

The predictions at NLO for all the PDF sets for W and Z cross-sections at the LHC at 7TeV centre 
of mass energy, with common fixed order QCD and vector boson width effects, and common branching 
ratios are shown in the left of Fig. [331 (some similar results at NNLO can be found in |17T] ) . There 
is fairly good agreement. However, there is a 3-4 a difference between the extreme results. There is as 
much variation in the absolute cross sections for and W~ in the right of Fig. [33] but here there 
is also some significant variation in the ratio. These particular cross sections are primarily sensitive to 
the quark distributions in the region x = 0.01, so the inclusion or not of the combined HERA structure 
function data could have some effect. However, the two PDFs which include these, HERAPDFl.O 
and NNPDF2.0 are two of the furthest apart. Hence, there must be implicit differences in the PDFs 
obtained by groups, which for at least some can not be fully reflected in the size of the uncertainties 
even with such measures as inflated Ax^ etc. One of the obvious examples is the theoretical constraint 
in the GJR "dynamical PDFs"|31j, which have a hypothesis for the starting distributions not shared 
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Figure 33: W and Z cross section predictions for the LHC (left) and and W cross 
section predictions for the LHC (right) using the PDFs from the different groups. Plots by 
G. Watt[T7D]. 



by other groups, but there are other, less obvious examples. 



3.3 Parameterisations 

One of the obvious sources of differences between the different PDF sets is their parameterisations. 
Consider the example of the a{W~^)/a{W~) ratio already illustrated in the right of [Mi The MSTW08 
prediction for this ratio has a very small quoted uncertainty of ~ 0.8% for this[26]. The predic- 
tion is sensitive to the u and d quarks - ~ ~ 1(1} ' "W'here in the last step we assume 
u{x) — )■ d{x),x — > 0, which data implies, and most parameterisations assume. Hence, this ratio is 
sensitive to flavour in the proton, and on valence quarks at x = 0.01, where they are a small, but 
still significant contribution to the total quark distributions (this is more clear in the asymmetry of 
a{W~^) and er(iy~) [172j ). The valence quarks of various groups can be seen, along with other PDF 
comparisons, in Fig. [34l and there are appreciable differences at x < 0.1. It might be thought that 
since the small-x valence quarks are only weakly constrained by data this variation in predictions is due 
to the valence quarks being overly constrained by a limited parameterisation. This was implied by the 
uncertainties in the earlier NNPDF sets, e.g. |165[ 1156] . but as we see in Fig. [351 iii the fully global fit, 
the NNPDF2.0 valence distribution, which has a much more flexible parameterisation, is no more un- 
certain that that of CTEQ and MSTW. This is also clear from the uncertainties in the right of Fig. [331 
Hence, PDF sets have differences in their valence quarks which are not entirely due to parameterisation 
inflexibility (though this may still play a part, perhaps affecting even NNPDF2.0 to some small extent 
due to the preprocessing). It is true that they include different data sets, but the uncertainties should 
account for this. Differences are more likely to be due to generally unconsidered reasons such as implicit 
assumptions on nuclear corrections to neutrino DIS data, or deuterium corrections, which it is difficult 
to account for. This not easily explained discrepancy between predictions for a(W~^)/a(W~) makes 
early measurements of this quantity at the LHC particularly interesting. This shows that although it 
is might be tempt to assign differences in predictions or PDFs which are larger than uncertainties to 
limited flexibility in parameterisations, there is not always much evidence that this is true. There are, 
however, a few explicit examples where the limitations on PDF parameterisations do affect the central 
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Figure 34: The HERA PDF compared to CTEQ (left) and MSTW (right) [126]. 
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Figure 35: The valence quark combination from various groups and its uncertainty [28^ 



values, but sometimes more particularly the size of the uncertainties of the PDFs 

One clear example of this is the gluon parameterisation at small x. In this case different param- 
eterisations can lead to different central value and also a very different uncertainty for the small x 
gluon distribution. This can be seen in Fig. |3l] where the MSTW gluon distribution at low x is a 
little smaller but where the gluon uncertainties have a very different shape (though the HERAPDFl.O 
gluon has a clearer smaller uncertainty in magnitude due to inclusion of combined data and more strin- 
gent tolerance). Most parameterisations assume the gluon behaves like a single power x^ at input. If 
g{x) oc x^^^"** then Ag{x) = A\\n{l/x)g{x). So this form of parameterisation by definition leads to 
a limited fractional uncertainty, growing fairly slowly as x becomes very small. This is represented by 
the "Alekhin" curve in Fig. [361 The HERAPDFl.O and ABKM gluon uncertainties would have similar 
shape. If the input for the gluon at low actually has A positive then the small-x input gluon is 
effectively fine-tuned to be ~ 0. In this case the very small-x gluon at higher scales is entirely generated 
by evolution from the more precisely determined gluon at higher x values and the uncertainty is even 
smaller, as in the "CTEQ" curve in the left of Fig. |36l The GJR uncertainty would be a similar 
form. The MSTW and NNPDF parameterisations are more flexible (the gluon can be negative) and 
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Figure 36: The gluon fractional uncertainty from a variety of PDF sets at small a; [26] (left). 
The gluon and its uncertainty at large x (solid) compared the the up quark (dashed) and 
down quark (dotted) ^108j (right) . 
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Figure 37: The strange quark and its uncertainty [28] from various PDF sets. 



this leads to a smaller distribution at the lowest x and a rapid expansion of the uncertainty where the 
data constraint runs out. Indeed, the two powers in the MSTW parameterisation allow more flexibility 
at the smallest x than NNPDF, perhaps because of the pre-processing power for the latter. 

There are also parameterisation variations in the high-x gluon. Generally high-x PDFs are param- 
eterised so they will behave like (1 — x)'' as x — )■ 1. Even though the parameterisation does contain a 
term of this form there is more flexibility in the CTEQ parameterisation (again seemingly even more 
than for NNPDF) . This allows a very hard high-x gluon distribution, as in the right of Fig. [361 which 
is still consistent with published Tevatron jet data flOS] . However, one might ask whether the gluon, 
which is usually though of as being radiated from quarks, should be allowed to be harder than the up 
valence distribution for x — )■ 1. This excess of the gluons does actually disappear by = 100GeV^ |108] 
due to fast radiation of very high-x gluons to smaller-x gluons. 

The other parton distribution with significant dependence on choice of parameterisations is the 
strange quark distribution. In fact the direct fit to s, s from dimuon data has tended to lead to a 
significant uncertainty increase compared to previous assumptions of a fixed fraction of strange in the 
well-constrained total light sea. This direct constraint is only for for x > 0.01 and there are a wide 
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variety of assumptions about what happens below this. In the MSTW08 PDFs it is assumed that the 
shape of the strange distribution can be largely inferred from the theory assumption that the suppression 
of this distribution is of the same form as the distributions of the massive quarks charm and bottom. 
This implies that below x = 0.01 the strange distribution is a fixed fraction of the total sea. As seen in 
Fig. [37] this results in a shape which is significantly different to that in CTEQ6.6 despite a fit to the 
same data. In the CTEQ6.6 distributions the assumption is weaker, i.e. only that there is the same 
small-x power for strange as light quarks. However, there is even a significant difference in the region 
of the data, which must be due to the effect of nuclear corrections and/or the heavy quark treatment. 
NNPDF2.0, which also includes dimuon data, impose no theoretical constraint (other than positivity 
of the dimuon cross section) on the strange quark distribution at small x. This results in a very large 
uncertainty which then impacts on the other small-x light quarks since it is only the charge weighted 
sum which is constrained by HERA structure function data at small x. Due to their simple choice of 
parameterisation of the strange distribution and the fact that at small x it is all generated by evolution 
the strange distribution in the HERAPDFl.O and GJR08 PDF sets will have an uncertainty at small 
X similar to that of MSTW08. Due to a lack of any theory constraint that of ABKM09 is similar to 
NNPDF2.0, though since the small-x behaviour is a single power the variation is not quite as large. 

3.4 Heavy Quarks 

The treatment of heavy quarks is something that nearly every group does slightly, or sometimes signif- 
icantly differently, and it can lead to perhaps surprisingly different results for the parton distributions 
extracted. In treating heavy quarks in parton scattering there are two distinct regimes: Near threshold 
for the quark production, i.e. ~ m\ massive quarks are not treated as not partons. They are entirely 
created in the final state and are described using the so-called Fixed Flavour Number Scheme (FFNS), 
e.g. for structure functions 

F\x,Q') = CriQVrnl) ® /^(g^), (84) 

where f'^'^ represents the light partons only. This is exact, but at each perturbative order there are 
ln"'(Q^/m|) terms which are not resummed. There is argument about the importance of these, but 
it is unlikely that resummation is universally unimportant. For structure functions the coefficient 
functions have been calculated to NLO [173] 1174] . and there is some progress at NNLO [T75] . but the 
coefficient functions are not calculated yet for many processes beyond LO. Alternatively, at very high 
scales ^ m\ heavy quarks can be assumed to behave like the massless quarks. In this case we have 
heavy quark parton distributions and sum the ln{Q'^/m\) terms via evolution. The simplest form of this 
is known as the Zero Mass Variable Flavour Number Scheme (ZM-VFNS), though mass dependence 
does come into the boundary conditions for evolution (calculated up to O^a"^) in [176j and to C(«|) 
in [175] ). This scheme is the normal assumption in calculations at high scales. It is not exact since it 
ignores 0{m\/ Q"^) corrections, e.g. for structure functions 

F(x,Q^) = Cf^^^®/;^+'(Q2). (85) 

This approximation does not matter if the scale of physics is ^ m\. However, in fitting structure 
function data in global fits one goes from the region ~ m\ to ^ m\ via the less clear region in 
between. Hence, for maximum precision one needs a General Mass Variable Flavour Number Scheme 
(GM-VFNS) interpolating between the two well-defined limits. 

There are various definitions possible [HIl ]I7S1 UZHl HZSl IM USD [M 1301 M, and there is a review 
in |184] and a numerical comparison of alternatives in |185] . (A theoretical underpinning is provided in 
[186] .) The versions used by MSTW (TR/TR')[I78l [I82] and CTEQ (ACOT) [1771 [m] have converged 
somewhat in recent years. Initially the ACOT prescriptions did not incorporate the correct kinematics, 
term by term in the expansions (though violations were limited by cancellations between terms). This 
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Figure 38: The comparison of the CTEQ up quark in a fit using the ZM-VFNS (dashed) and 
a GM-VFNS (left) (MRST is dashed-dotted) and a comparison of predictions at the LHC 
from the fit using the ZM-VFNS (labelled CTEQ6.1) and and a GM-VFNS (labelled CTEQ 
6.5) along with the prediction when a large intrinsic charm distribution is included (labelled 
IC-sea) (right) [HH]. 




Figure 39: A comparison of the gluon (left) and up quark (right) in two versions of a 
GM-VFNS [138], where the band represents the uncertainty on the MRST2004 PDF due to 
experimental errors. 



was rectified, but in a complicated fashion, in [178] . The simplest choice in the heavy flavour coefficient 
functions is now commonly based on the ACOT(x) prescription [181], i-e. the scaling variable x is 
replaced by x = x{l + 4m\/Q'^), which automatically incorporates the correct kinematic limit. However, 
there are still choices of m|/Q^-dependent factors, ordering of the perturbation series, and even subtle 
changes in scaling variable (see e.g. [187] ). Various significant differences still exist as illustrated by 
comparison to the recent HI data on heavy flavour production in Fig. |26l 

The importance of using a GM-VFNS instead of massless approach was illustrated by CTEQ [188] 
(it had been assumed by MRST/MSTW since [12| that the GM-VFNS was preferable, but a detailed 
study of the difference not presented). This can be seen in the left of Fig. [38] where the up quark 
with uncertainties compared with previous versions, e.g. CTEQ6.1 (dashed). Clearly the use of the 
ZM-VFNS can lead to a considerable error in the PDFs. This consequently can lead to a large change 
in predictions using CTEQ partons at LHC of 5 — 10% as seen in the right of Fig. [38l where we also 
note the possible effects of intrinsic charm. 

Although this large change in improving from the ZM-VFNS to a GM-VFNS can be viewed as a 
correction due to the missing physics in the GM-VFNS the freedom in defining a GM-VFNS at finite 
perturbative order, means there is still an associated theoretical uncertainty. This was studied briefly in 
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[138] where the differences in PDFs obtained using the NLO prescriptions in |178] and [182] . but much 



the same data sets, was investigated. The change of scheme could lead to changes of up to 2% in PDFs, 
as seen in Fig. [311 and this can lead to up to a 3% change in aw and az at the LHC. This is a genuine 
theory uncertainty due to competing but equally valid choices of a definition of a GM-VFNS, and is 
analogous to the freedom in making a choice of factorisation and renormalisation scale. The variation 
PDFs obtained from fits using different GM-VFNS choices has recently been investigated in more detail 
in [189] . and 2 — 3% seems a reasonable estaimate at NLO (though changes from ZM-VFNS to GM- 
VFNS were found to be typically ~ 5 — 6% in this case, slightly smaller than for CTEQ). Moreover, at 
NNLO the variation was reduced to ~ 1%, so the expected reduction in ambiguity at higher orders in 
perturbation theory is verified. 

There could also be some nonperturbative (intrinsic) heavy flavour, as well as that generated by 
perturbative evolution. This is suppressed by Aq^j^/Q^ or possibly Ag^^/W^^ ~ Aq^j^/{Q^{1—x)), and 
hence likely to be enhanced at high a;[lD]. CTEQ have investigated the possibility [IQOJ by constraining 
the intrinsic charm in a normal global fit (and considering an effect which can be large at all x). This 
suggests a maximum 1% integrated momentum density contribution of intrinsic charm (and the same 
for anticharm). The possible effects of this are shown in the right of Fig. [381 However, MSTW|26j have 
checked against against old EMC data[191j, finding at most (l/10)t/i this value. 

To summarise, the different fitting groups have different ways of dealing with heavy flavours. 

• MSTW08 use the definition of a GM-VFNS in at LO, NLO and NNLO, and precise details 
are described in [26]. The group have used a GM-VFNS for all partons since MRST98[l2], but 
the details changed in 2006. Before 2006 the NNLO GM-VFNS prescription was approximate, 
i.e. the first NNLO distributions correct in this sense are in [138j . and the correction led to a few 
percent change compared to [107]. Even now the NNLO GM-VFNS requires some modelling at 
low due to the absence of the full C(a|) FFNS coefficient functions (though some GM-VFNS 
definitions would not require these at NNLO). The information on the small- a; [192] and threshold 
limits [T93] at NNLO are used. Since the massless splitting and coefficient functions are known 
at NNLO the GM-VFNS becomes exact at this order well above = m\. PDFs are also made 
available for 3 and 4 flavours [194l 1195] . 

• CTEQ6.6 (and CTIO) use the definition of a GM-VFNS in [ISB] at NLO as default. The GM- 
VFNS version was only used as a special case (e.g. [196]) in the pre-CTEQ6.5 sets, where the 
ZM-VFNS was always used as default. 

• NNPDF2.0 uses the ZM-VFNS. The group has versions of of a GM-VFNS [I83] at NLO and one 
at NNLO bench- marked [TB^Sj along with MSTW and CTEQ, and there is a very new set using 
these [MI] . 

• HERAPDFl.O uses the same GM-VFNS as MSTW, i.e. that in [182]. Previous fits have used the 
older TR prescription [T78], but usually compared to ZM-VFNS and FFNS. 

• ABKM09 perform their fit using FFNS. They compare to the GM-VFNS defined in [3U] and claim 
insensitivity to using GM-VFNS. However, in this definition, although charm and bottom quark 
distributions are defined this is only to fixed order, i.e. unlike other variable-scheme deviations 
resummation of the ln(Q^/m|) terms in the parton distributions is not performed in the fit 
comparisons (it is ultimately when generating 4- and 5-fiavour PDF sets from the inputs obtained 
from the FFNS fit). While the PDFs are obtained at NLO and at NNLO, the heavy quark 
treatment is identical at both orders (see [147 ] for developments). 



GJR08 use the FFNS exclusively, and as ABKM using the same definition, i.e up to (9(a|), for 
the heavy flavour coefficient functions for both NLO and NNLO. PDFs are converted into variable 
flavour scheme evolution at NLO [197] and NNLO 1198]. 
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Figure 40: The values of 05 (m|) for PDFs of different groups. The error bar represents the 
central value and uncertainty for each set, and the points the values of asirn^) at which 
extra sets are made available. Plots by G. Watt |170j . 




Figure 41: The correlation of as and the gluon distribution (left) and the up quark distribu- 
tion (right) for the MSTW2008 PDFs |l20j . In each case the solid line represents the central 
PDF at given ^^(ml) and the band the uncertainty due to experimental errors. 



The different groups also use values of the charm quark mass varying from 1.3GeV > m^ > 1.65GeV, 
and the bottom quark mass varying from 4.3GeV > mb > 5GeV. In [29] and [30] variation is allowed, 
and varying the value of rric by 0.2GeV can change PDFs and predictions by up to a couple of percent. 
MSTW have recently completed a detailed study of mass dependence in the PDFs and predictions, 
agreeing with the HERA fit results [T95] . and also make PDFs available for 3 and 4 flavours for these 
different masses. There are also NNPDF results jl40] on mass dependence. 

3.5 PDFs and the Strong Coupling 

Each group deals with the strong coupling in a slightly different manner. For MSTW08, ABKM09 and 
GJR08 the Q;s(m|) values and uncertainty are determined by the fit both at NLO and at NNLO (in 
each case the NNLO value is about 0.002 — 0.004 lower than the NLO value - see later). However, the 
values are rather different, i.e. as{m?z) = 0.1202, 0.1179 and 0.1145 respectively at NLO (0.1171, 0.1135 
and 0.112 at NNLO). The other groups pick standard values and uncertainties, i.e. 0.118 for CTEQ6.6, 
0.119 for NNPDF2.0 and 0.1176 for HERAPDFl.O. In addition some groups provide additional sets at 
a variety of as values [1201 [28 | 1200] . The respective NLO values of as{rn%), the uncertainties, and other 
values available are summarised in Fig. 140(170] . 

One can also look at PDF changes and correlations in uncertainties for different Q;s(m|). ABKM09 
and GJR08 simply include 0:5 as an additional parameter in their error determinations, and uncertainties 
on physical quantities are obtained by summing over all free parameters in the error matrix. Due to 
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Higgs (M =120 GeV) with MSTW 2008 NNLO PDFs 



Tevatron.Vs = 1 .96 TeV 



LHC,\/s = 14TeV 



^6 

q 4 



^ 2 



-2 



-4 



-6 



C^6 

q 4 



^ 2 



-1a -cr/2 +cr/2 +1a 



-2 



-4 



-6 



A gg luminosity 



68% C.L. uncertainties 



-la -a/2 +a/2 +1a 



Figure 42: The correlation of as and the Higgs cross section at the Tevatron and LHC 
[120] ■ The closed point and bands represent the best prediction and uncertainty at each 
as{Tn?z) value, the inner dotted lines the uncertainty at the fixed best value of as{Tn\) and 
the outer dotted line the uncertainty including that on the coupling which is the envelope of 
the predictions. The triangles represent the variation due to changes in the gluon only and 
the open squares the variation in the factor of a| alone. 



Higgs production {m^=120 GeV) at 7 TeV 
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Figure 43: The uncertainty in the CTEQ prediction for the 120GeV Higgs cross section at 
the LHC from each of the 22 PDF eigenvectors (blue) and variations in as (red) at 7TeV 
(left) and 14TeV (right) [200]. 



the more complicated dynamical tolerance procedure this is not so straightforward for MSTW08. As 
stated the coupling uncertainty is determined from fit, i.e. as{m?z) = 0.12021;q;oq];5 at NLO (a;5(m|) = 
0.1171 

iaooi4 NNLO) and the PDFs are presented for the ±|cr and ±cr uncertainty as{rn?z) values, and 
similarly for 90% confidence level values. As as{m?z) departs from its best value the PDF uncertainties 
reduce since the quality of the fit is already worse than the best global fit. The PDFs and their 
uncertainties for different ^^(ml) values are shown in Fig. HH The expected gluon-a5'(m|) small-x 
anti-correlation is seen and this also leads to a high-x gluon-a5(m|) correlation from the momentum 
sum rule. The up quark at high is also shown. The gluon feeds into the evolution of the quarks, 
but change in asim^) just outweighs gluon change, i.e. a larger as{Tn?z) — )■ slightly more evolution. 
There is a strong anti-correlation at high-x due to evolution and positive coefficient function. The 
MSTW08 NNLO predictions for Higgs (120GeV) production for different allowed as{m%) values and 
their uncertainties are shown for both the Tevatron and LHC in Fig. |l2l The uncertainty increases by a 
factor of 2 — 3 (up more than down) at the LHC. The direct as{fn\) dependence is mitigated somewhat 
by the anti-correlated small-x gluon. At the Tevatron the two effects add due to the correlation at high 
X but in this case the intrinsic gluon uncertainty dominates. 

The HERAPDFl.O fit considers as = 0.1176 ± 0.002, basing their central value on the world 
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NNPDF1 .2 - Correlation of g(x,Q^) and a(M/) 



NNPDF1 .2 - Correlation of PDFs(x,Q^) and a(M/) 



f 0.5 

% 
o 

S 
o 

g -0.5 
O 



^,Q2 = 2Gev2 
g,Q^= 10000 GeV^ 






Total Valence V, = 
Singlet 2, = 
Tripfet T3, = 


2 GeV^ 

2GeV^ 
2GeV^ 






- ' ' I 















1e-05 0.0001 0.001 0.01 0.1 1 0.001 

X 



0.01 0.1 

X 



Figure 44: The correlation coefficient for as and the gluon (left) and various quark distri- 
butions (right) from NNPDF[202]. 



average pUT] , and add the results from the two extreme values in quadrature with other uncertain- 
ties. CTEQ also, in principle, base their value on the world average, choosing the similar value of 

0. 118 ± 0.002 [200] where the uncertainty is the 90% confidence level, though also point out the central 
value is the same as their best ffi. They also prove analytically that adding the uncertainties from 
the fits at their limits of ^^(ml) in quadrature with those from the other orthogonal parameter eigen- 
vectors is, in the quadratic approximation for Ax^, exactly equivalent to fitting with as{jn\) free and 
constructing the orthonormal eigenvectors from scratch using the extra variable. The one caveat to this 
is that for the latter case the 90% confidence level on as{Tn?z) inust correspond to a deterioration in 
the fit quality which is exactly the same as that for the 90% confidence level for the parton parameters, 

1. e. Ax^ = 100. This means the value of Q;5(m|) must be included as a data point in the fit with 
the appropriate (presumably rather large) weighting factor. CTEQ examine the uncertainty in Higgs 
and ti cross sections using both approaches, CTEQ6.6AS and CTEQ6.6FAS respectively (F stands for 
floating as ), finding that the uncertainty is the same up to at most 10%. The uncertainty on the cross 
section for a 120GeV Higgs boson at the LHC as a function of the parton eigenvectors and as is shown 
in Fig. 1131 

NNPDF2.0 also choose a particular external value of asim"^), in their case 0.119, with a one a 
uncertainty of 0.0012 or 0.002 for 90% confidence level. Due to their method of determining uncertainties 
via replicas they have an alternative manner of dealing with the as uncertainty [202]. In order to 
calculate a quantity they use the PDF sets obtained at different as{Tn?z) value with the number at a 
particular value of asi'm'^z) determined by the probability of as{m'^z) taking that value, i.e. 

{as - af^^ 
2{5a^s 



AT-^exexpf- ^-;; ), (86) 



where a^*^ is the central value and Sa's^^ the 68% confidence level uncertainty. They verify that there 
is an anti-correlation between the small-x gluon and as-, and the opposite at high-x. Fig. SH They also 
compare cross-sections for the Higgs boson from different groups and using different prescriptions for the 
uncertainty due to the coupling (there are also results in [185]). It is found that adding the deviations 
at the extreme values of as in quadrature with other uncertainties is generally a good approximation, 
as is now better understood from the CTEQ result above. Some of the worse discrepancies between 
MSTW, CTEQ and NNPDF, e.g. the predicted NLO Higgs cross section from CTEQ can be 7% lower 
than MSTW, are seen to be about halved by comparing results at the same as value. These results 
are using NNPDF1.2, but more recent results can be found in [203] , and are shown in Fig. HS] There 
is clearly some discrepancy, e.g. the lower CTEQ value of an (due to lower as and probably related to 
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PDF4LHC benchmarks - LHC 7 TeV 
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Figure 45: The variety in predictions from different groups and central values of as for the 
Z cross section (left) and the 120GeV Higgs cross section (right) at the LHC (7TeV) [203] . 



gluon parameterisation and default charm mass) and the lower NNPDF value of az (similar for W^, 
probably related to use of the ZM-VFNS). 

4 Other sources of Uncertainty 

In the previous section we have discussed various factors of what might be deemed theoretical un- 
certainty, such as parameterisations, heavy quark treatments, choices in strong coupling, which are 
unavoidable, and lead to differences between the PDFs obtained by groups and the resulting predic- 
tions. As well as these there are additional theoretical corrections which can lead to further changes or 
corrections. These are systematic corrections which will lead to the PDFs and their predictions from all 
groups being modified. Some are already investigated, perhaps partially by some groups or by others 
working in the field of perturbative QCD and parton distributions. Some of the most important are: 

• Standard higher orders, i.e NNLO in perturbation theory and beyond. 

• QED and weak corrections, which are nominally small, but where there might sometimes be 
enhancements. 

• Resummations, e.g. small x (a^ ln"~"'^(l/x)), or large x (a5ln^'^~^(l — x)) 

• Corrections at low Q^, e.g. higher twist and possible saturation effects. 

We will now discuss each of these briefly. 
4.1 NNLO corrections 

We have already pointed out that some groups produce NNLO PDFS, though the results vary quite 
a lot. As noted the extraction can be from a nearly complete NNLO definition within the global fit 
procedure, i.e. NNLO evolution, massless coefficient functions for structure functions and Drell-Yan 
vector boson production are known exactly, while some approximation and or modelling is required 
for massive quark coefficient functions or jet production. So the degree of theoretical approximation 
required in an NNLO fit, particularly if a GM-VFNS is used, is not large and the PDFs can certainly 
be taken seriously. However, we have so far not considered the change in the PDFs as one goes from 
NLO to NNLO and the consequences. 

It is important to note that because the PDFs are not physical quantities the NNLO PDFS are 
not simply a more accurate version of the NLO PDFs. There can be systematic differences. This is 
illustrated in the left of Fig. HHl where we compare the up quark, the most accurately determined 
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Figure 46: The MSTW2008 up quark distributions at NLO and NNLO (left). The absolute 
values, with la uncertainties are shown in the upper plot, and the ratio compared to the 
central value at NLO are shown in the lower plot. The i^-factors for Drell-Yan production 
at NLO and NNLO [138] for a muon pair of invariant mass 4GeV (right). 



parton distribution, at NLO and NNLO. One can see that the shape is different as a function of x 
and the central value of the NLO and NNLO PDFs can differ by 3-4 times the uncertainty. This does 
not necessarily mean a large change in any physical quantities however, as the change in PDFs can 
be compensated by a change in the coefficient functions. Indeed, both the PDFs shown fit the same 
structure function data with a similar quality. As well as coefficient functions and PDFs the physical 
quantities implicitly depend on the strong coupling constant as- The NNLO corrections are largely 
positive, i.e. evolution of PDFs increases in speed at both large and small x and many cross section 
corrections are positive, e.g the ii'-factor for NLO and NNLO fixed target Drell-Yan production is shown 
in the right of Fig. HHl the latter being ~ 10% bigger. Hence, it seems very likely that the coupling will 
have to become smaller at NNLO in order to compensate. Indeed, this is seen in all cases - MSTW[26]. 
ABKM[3D] and GJR[3I] all see a fall in the value of as{ml) of 0.002-0.004 at NNLO compared to NLO. 
Hence, precise predictions using NNLO PDFs require the simultaneous use of NNLO cross-sections (or 
vice versa) and the appropriate coupling at NNLO. 

Since, we do have these PDF sets determined at NNLO surely it is best, i.e. most accurate, to make 
use of these. In principle this is correct, however, we only know some hard cross-sections at NNLO. 
Processes with two strongly interacting particles are largely completed - DIS coefficient functions and 
sum rules, pp{p) ^ 7^ VT, Z (including rapidity dist.), H, A^, WH, Zff fMIM [2061 [2071 [2081 12091 [2T0] . 
For other final states the NNLO coefficient functions are not known and so NLO PDFs are still more 
appropriate. There are even some processes where only LO is known, particularly those with large 
number of final state particles or very exclusive final states. 

However, as well as providing us with maximum precision, if it is available NNLO also tells us about 
the convergence of perturbation theory. For most structure functions convergence is guaranteed, because 
the PDFs are obtained by comparison to the very accurate data. Hence, it is predictions (including the 
normalisation of the total W and Z cross-sections at the Tevatron since the normalisation uncertainty 
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Figure 47: Gluon distributions at LO, NLO and NNLO along with their uncertainties at 
each order due to experimental errors. 



of the data is large), which are more illuminating. For W and Z cross-sections at both the Tevatron and 
the LHC the perturbation series is reasonably convergent. In both cases the prediction is 3-4% higher 
at NNLO than at NLO [26], but this change is about twice the uncertainty quoted at each order. The 
NNLO prediction is in better agreement with the measurements at the Tevatron |212[ [2T3] . The change 
is dominated by the change in the PDFs as one goes from NLO to NNLO rather than the change in 
the cross section. The NNLO Higgs cross section is 25% bigger than at NLO. In this case the change 
is completely dominated by the NNLO contribution to the cross section. In both cases better stability 
is achieved by allowing the NNLO strong coupling value a5(m|) to be 0.003 lower than at NLO than 
by using the same value in both cases. 

Hence, there is some question about the stability of cross-sections as one progresses to higher orders 
in the perturbative series. This may sometimes be related to the issue of resummations at large or small 
X and whether these are important. In Section 2.7 we gave a preliminary discussion of the convergence 
at small x. In Fig. |17]we see a comparison of the gluon distribution extracted from the global fit at 
LO, NLO and NNLO [26]. The additional positive small-x contributions in the splitting function Pqg at 
each order lead to a smaller very small-x gluon at each successive order. Hence, in this regime there is 
clearly fairly poor stability. This is similar for Fl{x, Q"^), though there is some compensation due to the 
NNLO coefficient function. A more dramatic consequence for this lack of convergence can be seen for 
the LHC. In Fig. H8{2l4j we show the LO, NLO and NNLO predictions for for Z and 7* production at 
the LHC for 14TeV centre-of-mass energy, as a function of rapidity and invariant mass. There is good 
stability in the predictions for very high final state masses, but it becomes worse at lower scales where 
as is larger and large ln(s/M^) terms appear in the perturbative expansion, which are equivalent to 
ln(l/a;). This suggests resummation may be necessary in this regime. 

We note that at large x there is enough information available to perform reliable approximations 
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Y*/Z rapidity distributions at LHC 




Figure 48: The Drell Yan cross section at LO, NLO and NNLO at the LHC[2T1]. The dotted 
hnes show the contributions from various subprocess at each order, e.g. NNLO qg is the 
contribution from a quark and gluon initial state at NNLO. 



to fits at NNNLO, see e.g |215[ 1216] , in the nonsinglet sector . This seems to lead to no significant 
difference to the results at NNLO, with a stabilisation in the values of as, implying good theoretical 
convergence in the the kinematic region where these fits are performed. 



4.2 Electroweak corrections 

In principle the smallness of the electromagnetic and weak couplings should mean the corrections from 
this source are small. However, at high scales a ~ a| so they may be comparable to NNLO QCD 
corrections. Additionally there can be some enhancements, or violations of symmetry not present in 
QCD which can be important. The simplest thing one can consider is the QED-improved DGLAP 
equations which are. 
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at leading order in as and a, where 
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Pqq = Cp Pqq, P^q = Cp Pgq, Pqy = Pqg, P^y = — - '^l^i ~ (^^) 
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Figure 49: QED induced different in valence quarks in the proton compared to those in the 
neutron [2 17j . 
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Figure 50: Drell Yan cross-sections with an initial state photon j218j . 




7(x, /i^) is the photon distribution and momentum is conserved: 

dx X \^Yl9iix,fx^) + g{x,f/) +'j{x,f/)^ = I. (89) 

There has been one detailed study of the effects of the QED evolution on the PDFs obtained via 
a global fit in |217j . Because there are not any enhancements in Eq. (1571) the effect on the quark 
distributions of the QED corrections is negligible at small x where the gluon contribution dominates 
the evolution. The main effect is that the gluon loses a little momentum to the photon in order for the 
momentum sum rule to be satisfied. At large x, photon radiation from quarks leads to faster evolution, 
roughly equivalent to a slight shift in as, i.e. Aas{rn'^z) — +0.0003. Overall, the QED effects are much 
smaller than many sources of uncertainty. However, the up quarks at high x radiate more photons 
than down quarks due to the higher charge weighting. This leads to an automatic violation in the 
charge symmetry assumed in Eq. fHSj) . as seen in Fig. |19|, and this reduces the NuTeV anomaly in the 
measurement of sin^ [HI |82] . The other place where QED corrected PDFs are important is where an 
initial state photon plays a role. Consider the electroweak corrections to lepton pair production [2 ISj . 
In the hard cross-sections the QED effects are typically a few percent and negative, becoming larger in 
magnitude at high transverse momentum. However, one also needs to consider photon-induced processes 
driven by the photon distribution of the proton, as shown in Fig. EUl Can be a significant fraction of 
the other electroweak corrections, and in the opposite direction, i.e. positive. 

Large Electroweak corrections are potentially possible due to enhanced logarithms of the form 
aj^ log^"(ii^|./M^)) in the perturbative series pi^ . Jet cross-sections are an example |220j where there is 
potentially a big effect at LHC energies where log^iE^/M^) is a very large number, as seen in Fig. [51] 
Similar results exist for corrections to other processes with a hard scale, e.g. di-boson production [221] 
and large-pT vector bosons in conjunction with jets ^222j (though very sensitive to jet vetoes). These 
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Figure 51: The fractional size of the NLO electroweak effect on high-ii^y jet cross- 
sect ions p20]. 



could potentially affect the extraction of PDFs at the LHC if they are not taken account of properly. 
In order to do this though, one must not only have virtual corrections for W, Z, but must have con- 
tributions of the form where the bosons are emitted as extra final state particles, which will certainly 
cancel the loop virtual corrections to some significant extent [223] . Whether the consideration of parton 
distributions with weak bosons as well as the photon will aid maximum precision since it is known that 
the evolution will produce electroweak double logarithms |224] . 

4.3 Small-x Theory 

As seen in Section 4.1 there are fairly strong hints of some instability, or lack of convergence in pertur- 
bation theory, when small-x PDFs are probed. The reason for this instability was outlined in Section 
2.7 - as known since |121[ I122[ 1123] at each order in as each splitting function and coefficient function 
generally obtains an extra power of ln(l/a;). For the parton distributions these leading logarithms 
can be obtained from the BFKL equation for the high-energy limit of the unintegrated (in transverse 
momentum k) distribution 

f{k\ x) = fj{Ql) + f ^asT %K{q\ e)f{q\ x) (90) 
Jx x' Jo 

where f{k'^,x) is the unintegrated gluon distribution g{x,Q'^) = {dk'^/k'^)f{x,k'^), and K{q'^,k'^) is 
a calculated kernel now known to NLO |225l 1226] . The physical structure functions are then obtained 
from 

a{Q^, x) = I (dkye) h{e/Q^)f{k\ x) (91) 

h{k'^/Q'^) is a calculable impact factor, known for structure functions [1921 [227] and some other processes, 
e.g. [228] . As mentioned, the global fits usually assume that this is not significant in the region 
of interest, though a purely phenomenological investigation [229j did find that resummed terms were 
preferred by data. 

The inclusion of the NLO corrections to the BFKL equation and the consequent scale breaking 
made the solution much more difficult to obtain due to the difficulty of avoiding nonperturbative 
contamination from the infrared region. This led to a concentration of effort more closely related to 
the coUinear factorisation of the usual perturbative ordering, and in particular the assumption that 



53 



0.3 



0.1 - NLL 

ABF 

cess 

■■- NLO 

Q I I I 

10"*10'"'10"'*10'"''jO""JO"' 1 

X 

Figure 52: Comparison of the leading splitting function P+ ^ Pgg + A/9Pqg) from different 
groups P5U] . NLO is the standard fixed order NLO result, NLL is from |23U] . CCSS from 
[23T] and ABF from [232] . 



input PDFs should be fit and splitting functions and coefficient functions calculated. On this basis 
there has been good progress in incorporating ln(l/x) resummation from essentially three groups |230[ 
12311 1232] with results roughly in agreement, despite some differences in technique. In order to achieve 
stable results additional effects such as running coupling |233j effects and (depending on group) other 
corrections such as resummation of dominant coUinear logarithms |234j are included. A comparison of 
the leading (mainly gluon) splitting function compared to the standard NLO result is shown in Fig. [321 
It is a common result that the small-x resummation leads to a dip for x ~ 10~^ before the expected rise 
at very low x in splitting functions and coefficient functions (though a full set of coefficient functions 
is still to come in some cases). A recent review of this work can be found in |153j . There are also 
approaches which attempt to predict the full structure functions, rather than just coefficient functions 
and splitting functions, e.g. |235[ 1236] . though this must necessarily introduce some assumption about 
or modelling of the infrared physics. Results are encouraging, but it is more difficult to directly relate 
the PDFs and structure functions obtained to the standard fixed order ones using this type of approach. 

A fit to data at NLO plus NLO resummation (in DIS scheme to NLO and a DIS-type resummed 
scheme beyond) with full resummation for heavy quarks included |23U] has been performed. It leads to 
significant improvement in the fit to HERA data within a global fit and a change in the extracted gluon 
(Fig. |53|) . making it steeper at low and consequently slightly larger than the fixed-order gluon below 
X = 0.005 at higher Q^. Together with indications from Drell Yan resummation calculations |237j this 
suggests at least a few percent effect due to small-x resummation at the LHC is quite possible, even 
for W and Z bosons at higher rapidity. The resummed fit also produced a prediction for the HERA 
data on Fi{x,Q^) at low Q^ |126t 1127^ 1128] . The results are seen in Fig. EH The prediction is clearly 
successful, and gives additional evidence that resummation may be important, though there are other 
possible ways of explaining the excess over the NLO and NNLO perturbative predictions at small x, 
which also means low Q^, which leads us to the next topic. 

4.4 Low Q^, Higher Twist, Saturation 

As noted in Section 1.2, the factorisation into coefficient functions and parton distributions is formally 
broken by corrections of 0{Aqqj)/Q^). This effect is expected to be enhanced at high values of x, 
and is related to the resummation of the a| ^^"""^(1 — x) perturbative corrections |238l I239[ 1240] which 
is formally divergent, leading to an ambiguity in the perturbation series which can be interpreted as 
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Figure 53: Comparison of the gluons from 
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Figure 55: Predictions for the power corrections to F2{x,Q'^) (solid) and Fi{x,Q'^) and 
F'i{x^Q'^) (dashed) [54j compared to an extraction from data. 



power corrections or infrared renormalons|53[ IM] . There have been numerous studies of higher twist 
contributions at high x, e.g. [24 I143[ I229[ I242[ I243[ I216j . and there is general agreement in the results. 
All studies find that the higher twist effects for F2 and F3 appear to be as expected from renormalon 
calculations, seen in Fig. |55l i.e. most important for high x. They diminish with the perturbative order, 
i.e. at lower order they are mimicking the missing higher order effects, and appear to be stabilising 
in size by NNLO. Indeed, in those which have approximations to NNNLQ |241l 12421 1216] there is little 
difference, within uncertainties, to the higher twist extracted at NNLO. Hence, the series presumably 
reaches maximal convergence near NNNLO at high x. For F3 higher twist is a slightly larger effect at 
moderate x ~ 0.01 — 0.1, and this certainly seems to be the case in Fr |143[ 1244] . as predicted by the 
renormalon calculation |245] which is has no protection from any sum rule at small x, as the Adler sum 
rule provides for nonsinglet F2. This nonsinglet higher twist correction for Fl (which is unrelated to 
the gluon distribution) is another possible explanation (at least in part) of the apparent low-Q^ results 
in [T261 [1271 [128] . 

At low X there has long been an expectation that higher twist effects related to the gluon should 
be important, mainly due to the fact that the gluon distribution is expected to be very large and 
hence recombination [246J and ultimately saturation effects should set in. Empirical investigations have 
suggested |229] that this is not the case, with higher twist effects again diminishing with order, but not 
being very significant beyond the LO estimate. Additionally, a study of absorptive corrections |247] does 
not imply a very big effect. It has been suggested that this is due to an accidental cancellation of large 
terms in F2(a;,Q^), e.g. |248] . and that large gluon induced higher twist will persist in other structure 
functions. However, it may also be related to the fact that the small-x low-Q^ gluon extracted from full 
NLO and NNLO fits is actually not nearly as large as once expected, or often assumed in attempts to 
calculate higher twist, which are often built upon LO perturbative expansions where the gluon is much 
larger. 

The subject of the gluon at small-x, and the degree of saturation has become a very large topic of 
study, inspired by the discovery that within the dipole model for DIS scattering a simple model incor- 
porating saturation could give a good fit to both inclusive and diffractive structure function data |249] . 
This has branched out into the colour glass condensate |250] approach to the small- a; gluon and is far 
too extensive a topic to summarise here. A brief discussion and summary of recent results and fits 
using the dipole model, saturation effects and the colour glass condensate can be found in |251] . and a 
more recent review of the colour glass condensate in particular can be found in |252] . Here we simply 
note a couple of points. More recent and sophisticated treatments, including certainly heavy quarks 
(sometimes missed in early studies) and impact parameter-dependence, seem to find saturation being 
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Figure 56: The line denoting the saturation scale as a function of x and for various 
approaches and different values of impact parameter. See [253j for details. 



associated with rather lower x and Q^. The saturation scale scale for various treatments is illustrated 
in Fig. inni is seen to be at very low x even for 6 = 0, falling to even lower x as 6 rises (the average 
for inclusive processes is 6 ~ 2 — 3GeV~^). For truly quantitative results it is also necessary to match 
the calculations in these approaches to the PDFs obtained at higher x and from the reliable results 
using the coUinear factorisation theory, which is by no means trivial and certainly not automatic |254] . 

5 PDFs for LO Monte Carlo generators 

A recent development in the study of PDFs has been the introduction of a different definition of parton 
distributions generally known as modified LO PDFs. These have arisen due to the frequent need to 
use generators for events at particle colliders which perform the cross section calculation only at LO 
in QCD. LO cross sections combined with LO PDFs is often a very inaccurate approximation, usually 
being rather too small in normalisation and sometimes also with the wrong shape. This can easily be 
understood if one considers that NLO matrix elements (and beyond) often give large positive corrections: 
at small x due to 1/x divergent terms in the matrix elements; near threshold due to large corrections 
from soft-gluon emissions near the edge of phase space; and there can be numerically large corrections 
from analytic continuation from the space-like to time-like region, e.g. a (1 + asT^Cp/^) factor in Drell- 
Yan production. Cross sections for hadro-production of W, Z, Higgs bosons, tt, 66-production and 
jet production (including W/Z+ jets) all have NLO enhancements from at least one of these sources, 
t-channel processes do not have these type of large corrections, and for e.g. single top or Higgs via 
vector boson fusion the NLO matrix-element correction is small. Such processes probe partons usually 
in the range of x = 0.1, i.e. neither very large or small. 

The use of NLO PDFS in LO Monte Carlo generators has been suggested to counter this, since NLO 
PDFs are larger in some regions than at LO. Sometimes it does lead to better results, but sometimes even 
worse, particularly at small x where NLO PDFs, especially the gluon distribution, can be much smaller 
than at NLO because they have been extracted with large positive contributions to quark evolution 
included at NLO. An alternative argument is that rather than the normal fixed order definitions this 
situation requires the introduction of new type of modified LO (LO*) PDF [255J . These allow the LO 
PDFs to be generally bigger by allowing momentum violation in global fits performed at LO, and can 
also use the NLO definition of as, which is larger at low scales where much of the DIS data is fit than 
a LO as with the same value at m|. As a further development one can also make the evolution more 
"Monte Carlo like" , by changing the renormalisation scale in the coupling from to something more 
like resulting in the LO** distributions [256]. In both cases the PDFs are obtained entirely from a 
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Figure 57: Comparison of the predictions for bb production using a LO generator and various 
PDFs and a NLO generator with NLO PDFs[2S5]- The upper plots are absolute cross- 



sections with the variety of combinations of order of matrix element (either ME(LO) or 
ME(NLO)) and type of PDF (LO, LO* and NLO). The lower plots are the ration of the full 
NLO prediction to each prediction using LO matrix elements and different PDFs. 
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Figure 58: Comparison of the up quark in conventional LO and NLO sets to the MRST 
LO* set [255], either absolute value (upper) or ratio to NLO (lower), each with the NLO 
uncertainty included (left) and the CTEQ modified LO sets |258j as a ratio to NLO (right). 
In the latter case pink gives the MC@NLO result, green, blue, red and black the result using 
a LO generator and LO*, LO**, NLO and LO PDFs respectively. 
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W- rapidity distribution 




Figure 59: Comparison of the W rapidity using conventional LO and NLO sets that using 
the modified LO sets from CTEQ[2n8] (left) and MRST[256] (right). 



fit to existing data, the quality of the LO* fit being much better than the rather poor LO result, but 
not quite as good as NLO. It was hoped that in the modified PDFs the enhancement in the partons 
compared to standard LO should compensate to some extent the missing NLO enhancements in the 
matrix elements. In [255] there was an extensive investigation of whether this idea works in practice. 
Comparison was made between predictions for a wide variety of processes made using the MC@NLO 
generator |257j . which combines NLO matrix elements with parton showering, using NLO PDFs, and 
predictions using LO generators and LO, NLO and LO* PDFs. Taking the MC@NLO results to be the 
most accurate representation it was found that for the vast majority of cases the LO* gave the best 
results for the LO generators, particularly for gluon initiated processes, while sometimes standard LO 
and sometimes NLO gave the worst results. As an example we see in Fig. [57] the final state distributions 
for single b and bb pairs |255| . where the results using the LO* PDFs are almost identical to NLO. 

The LO* MRST PDFs have been followed by parton distributions for event generators from the 
CTEQ collaboration |258] . The reasoning for the need for these PDFs is the same as for the LO* sets. 
However, the manner of obtaining them follows some of the same principles, but also has some different 
ones. Various sets have been produced, CT09MCS, CT09MC1 and CT09MC2. Some (CT09MC1 and 
CT09MC2) do allow the violation of the momentum sum rule and in CT09MC2 the NLO definition 
of the coupling is used. A major difference is that the PDFs are obtained by fits also including LHC 
"pseudodata" generated using full NLO calculations. This is different in philosophy to the LO* sets. 
It is noted that there is significant tension between the best fits to pseudodata and to the existing, 
largely structure function data. There is no modification of the scale of the coupling to make it more 
Monte Carlo like, but scales are varied to obtain the best quality fit to the pseudodata. An example 
of the comparison of the modified PDFs to standard LO and NLO PDFs is shown for the up quark in 
the right of Fig. [58] Clearly when momentum is violated the CT09MC quark is much larger than the 
fixed order over most x values. This is unlike the MRST LO*/LO** quark in the left of Fig. [581 which 
is constrained at x = 0.01 to give a good fit to high-Q^ HERA data. Inclusion of existing Tevatron 
data on Z rapidity, which postdates the LO* set would add some tension and should raise the quark 
distribution in this region a small amount. The gluon distributions in the two approaches are more 
similar, both being much bigger than fixed order at small x. This is not surprising since the lack of 
direct constraint on the gluon distribution renders the differences in the approaches less important. 

The comparison of the two types of PDF for gluon initiated processes is very similar. There is 
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more difference in quark driven processes, such as W and Z production. An example is shown in Fig. 
EH] which shows the W rapidity distribution. The left-hand figure is from [258] , and shows excellent 
agreement using CT09MC2, which is not surprising since the PDF as been obtained by fitting data of 
this type very well. The MRST LO* looks less successful in this plot. However, these results are from 
inclusive calculation with no parton showering as applied in generators. This can have a few percent 
effect, as seen in Section 3 of |2 5 8 j . and automatically includes some higher orders in the LO calculation. 
The right-hand plot from |256j does include parton showering in both LO and NLO calculations. The 
LO* results are somewhat nearer to NLO in this case since the LO calculation is a little nearer to NLO 
in this framework. Applying parton showering to the left hand plot must, on this evidence, improve 
the LO* comparison, and will affect the CT09MC2 comparison to some extent as well. There is no 
comparison to t-channel processes in [258] . In |255j the enhancement of the PDFs led to these being 
marginally worse than LO PDFs. Even further enhancement is unlikely to be helpful. 

An entirely different alternative is to obtain PDFs from fits using Monte Carlo generators directly. 
In detail this will then produce a slightly different PDF set for each generator, the details of parton 
showering differing between each. Work on this approach has been ongoing. However, it is rather more 
time intensive than normal fits, though there have been helpful developments [2SS], and results are so 
far limited. As noted near the beginning of Section 2 of this article, a very wide variety of data needs 
to be fit to provide true constraints on any PDFs, so it is not clear if this approach will lead to useful 
results in the immediate future. 

6 Outlook 

This review demonstrates the vast amount of progress that has taken place in the last years on pinning 
down the PDFs of the proton, as well as the dramatic increase in awareness of the impact of PDFs 
on the physics program of LHC experiments. LHC will need the best PDFs, especially for precision 
measurements, setting of limits in searches, and even for discoveries. Ideally the ATLAS and CMS (and 
LHCb and ALICE) analyses should follow a common procedure for using PDFs and their uncertainties 
in their key analyses. Also, changing frequently the PDFs in the software of the experiments, e.g. for 
cross-checks or the determination of error bands, is often non-trivial (e.g. due to the inter-connection 
with parameter choices for underlying event modelling, showering parameters and so on) and sometimes 
impractical if CPU intensive detector simulations are involved. LHC studies therefore will need both 
good central values for the PDFs to start with, and a good estimate of the associated uncertainties. 

This has triggered the so called PDF4LHC initiative. PDF4LHC offers a discussion forum for PDF 
studies and information exchange between all stake-holders in the field. More details and links to the 
meetings so far can be found on the PDF4LHC web site |260j . Apart from getting the best PDFs, 
including the PDF uncertainties, based on the present data, another important deliverable is to devise 
strategies to use future LHC data to improve the PDFs. All this needs a close collaboration between 
theorists and those that are preparing to make the measurements. Such measurements include W and 
Z production and asymmetries, di-jet production, hard prompt photons, Drell-Yan production, bottom 
and top quark production, Z-shape fits and Z+jets measurements. One expects that some of these 
channels can already be studied with first data at the LHC. 

The final HERA data of run II (2004-2007) are still being analysed and will be very instrumental for 
future PDF fits, particularly for the high region. These data will become available in the next few 
years. Meanwhile interest is growing worldwide for a novel electron-ion collider. Design concepts exist 
at CERN, with ideas to intersect an electron accelerator with the Large Hadron Collider (LHeC) [261] . 
and in the U.S. to either add an electron accelerator to the Relativistic Heavy Ion Collider at BNL, 
or an ion accelerator to the upgraded 12-GeV Continuous Electron Beam Accelerator Facility at JLab. 
The US project is generically called the EIC |262] . 
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Figure 60: The kinematic reach for LHeC 

The LHeC has two alternative scenarios: a ring-ring (RR) scenario and a hnac-ring (LR) scenario. 
For the RR scenario one typically has a 50 GeV electron beam on a 7 TeV proton beam (or 2.75 TeV 
heavy ion beam), and a peak luminosity of around 5.10'^'^ cm~^s~^ for 50 MW power. The LR scenario 
has the potential to reach larger electron energies, perhaps up to 150 GeV, but in general the total 
integrated luminosity will by a factor 5 to 10 lower compared to the RR option. 

The ETC projects discussed by BNL and JLab describe an electron beam of 4 to 20 GeV on a proton 
beam of 50 to 250 GeV. The peak luminosity aimed for is similar to the RR LHeC option. Polarization 
is an integral part of the proposal, aiming for 70% of polarization for each beam. 

The kinematic reach covered by the LHeC is shown in Fig. [HU] and for the EIC in Fig. [HH The high 
energy of the LHeC will allow to explore a new kinematic area for ep collisions, with values and x 
values down to a few times 10^^. The high luminosity anticipated for the EIC and the possibility for 
polarized beams will allow for number a precision and novel measurements. It will however take us into 
the next decade before any future DIS data with much higher precision or larger kinematic domain will 
be available. 

Further possible future constraints may come from the JLab experiments for the high-x range with 
the new high intensity 12 GeV electron beam upgrade. The MINERvA experiment will use neutrino 
beams on nuclear targets at FNAL and is set to make precise measurements of neutrino cross sections 
with neutrino beams of energies up to roughly 30 GeV. MINERvA will collect 6M events on a carbon 
target in the transition (not so deep DIS) and DIS region plus an additional 6.5 M events in four 
nuclear targets and will significantly increase the existing neutrino data set available to the community. 
E906 is a new experiment at FNAL and is set to measure Drell-Yan production via muons. E906 will 
measure the ratio of the d to u distributions in the proton and the modifications to the quark sea in a 
nucleus. The expected statistics that will be collected is a factor 50 larger than that of E866/NuSea. 
The incoming proton beam energy will be only 120 GeV, reducing in the energy squared s by a factor of 
7 with respect to E866/NuSea. Thus the measurements of E906 will cover a different kinematic range, 
namely up to high x values of 0.5. The run of E906 is scheduled to start in 2010 and to last for 2 years. 

All these new data, together with ongoing theory developments will ensure that further improve- 
ments on the understanding and precision of the protons structure will continue in the next 1-2 decades. 

7 Conclusions 

Structure function measurements are a measure of the partonic structure of the proton and are instru- 
mental in parton distribution function fits. The PDFs allow us to predict cross sections at particle 
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Figure 61: The kinematic reach for EIC 



colhders and a good knowledge of PDFs and their uncertainties is of prime importance for the success 
of the physics program of e.g. proton-proton colhsions at the newly commissioned collider LHC 

Several versions of fits to data relevant for the hadron structure exist and all show that overall good 
quality fits using NLO or NNLO QCD can be obtained. Apart from the central values of the PDFs, it is 
essential to have also a good understanding of the uncertainties. Various ways of looking at uncertainties 
have been discussed in this paper, as used by the different fitting groups. The uncertainties are naively 
rather small - ~ 1 — 5% for a large number of PDFs and the predicted LHC quantities. Measurements 
of ratios, e.g. /W~ ratios can give extremely tight constraint on partons distributions. However, 
there are many effects in the fitting procedure that can contribute to the uncertainties, e.g. effects 
from input assumptions, in particular the selection of data fitted, cuts applied to the data and input 
parameterisation choices can shift the central values of predictions significantly and affect the size of 
uncertainties. During the last years it has also become clear that a complete heavy flavour treatment 
is essential in the extraction and use of the PDFs. Furthermore PDFs and as are correlated and the 
uncertainties must be considered in tandem. These are all part of the Fixed Order QCD analysis, 
but there are additional effects. Electroweak corrections are often neglected but can potentially be 
large at very high energies following ln2(EVM2,). Equally care must be taken for errors from higher 
orders/resummation, and power corrections/higher twist, which can be potentially large in certain 
phase space regions. Direct measurement of Q^) at HERA now give us some scope to test these 
predictions. 

Since there is a spread in the PDFs obtained by the "global" fits which is indeed actually some- 
what larger than the quoted uncertainties of each some procedure is required to estimate the "true" 
uncertainty associated with PDFs. If one is comparing to a measured cross-section then, of course, the 
comparison can and should be made to the prediction using any PDF set. In fact it is ideal to check 
as widely as possible to help determine which PDFs are most accurate in their predictions. However, 
if one is trying to determine the best value and uncertainty on a prediction in order to help set limits, 
determine the significance of a signal, or estimate uncertainty from extrapolation into certain regions 
of phase space, there is the need for some recommendation for a best prediction and a conservative, 
but representative uncertainty. This was requested, for example, for making benchmark predictions 
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Table 1: Cross-sections predictions and uncertainties at the LHC at NLO and NNLO using 
the PDF4LHC prescription for the central value and uncertainties. (Note the NNLO result 
for the ti cross-section in the table uses the NLO matrix elements. An approximation 
including a variety of NNLO contributions is available in |266j . but if NNLO PDFs are 
used the correction to NLO at the LHC is only a couple of percent, small compared to the 
uncertainty, though is larger at the Tevatron.) 



a{W+) nb 


a{W-) nb 


a{Z) nb 


cr(tf) pb 


o"(-f^i2o) pb 


cr{Hiso) pb 


o-(-f^24o) pb 






7 TeV centre of mass 


energy (NLO) 






54.7 ±2.5 


37.8 ±2.0 


27.6 ± 1.1 


162 ± 14 


12.06 ±0.75 


5.04 ±0.32 


2.75 ±0.20 


7 TeV centre of mass energy (NNLO) 


56.9 ±2.6 


40.0 ±2.1 


28.9 ± 1.2 


169* ± 15 


15.7 ±0.98 


6.53 ±0.41 


3.52 ±0.26 



for Higgs boson cross-sections [263] . and has been a major focus of the efforts of the afore-mentioned 
PDF4LHC group. If very high or very small x PDFs are probed the lack of significant data constraints 
in some PDFs can lead to big variations - the minimum variation seems to be for x = 0.01 — 0.001, 
since all fits include HERA data, but also sum rules impose crossing points. However, the spread of the 
predictions [TTOl I264[ 1171] can be significant, even when the PDFs probed are in this x range, as seen 
in FigHSl Hence, it is the interim PDF4LHC recommendation |260i [265] that at NLO a conservative 
uncertainty should be the envelope of the predictions using NNPDF2.0, CTEQ6.6 and MSTW08 PDFs 
and their uncertainty, including that due to variations in 05 (m|). The centre of the envelope can be 
taken as the best prediction. An example of predictions using this is seen in Table [H Since of these 
three sets MSTW is the one available at NNLO at present the central value should be taken from this, 
but the fractional uncertainty should be the same as at NLO in order to be conservative, also seen in 
Table [H NNLO sets are available from ABKM, JR and most recently HERA, as described earlier in 
this article, and so ideally checks should also be made against these, though some deviations can be 
large. As seen the NNLO corrections themselves can be large for some processes. This recommendation 
is to be viewed as temporary and updates are expected. Another aim of the PDF4LHC group is to 
investigate, and hopefully minimise the spread between different sets, or at least to understand it as 
fully as possible. This will be aided by additional data as well as theoretical improvements, and will 
also inform future recommendations. 

Hence, as well as testing for Beyond the Standard Model Physics the LHC can add significantly to 
our knowledge of the proton structure especially at low x through measurement at high rapidities of 
hard probes, e.g. W, Z and Drell-Yan events. Indeed, the former is largely a precursor for the latter. 
The extraction of PDFs from existing data and use for LHC is still a far from straightforward procedure. 
We currently have the bare minimum of constraints required for constraining all PDFs over the full 
range required, and the a full understanding of PDF uncertainties related to experimental errors is still 
being developed. In addition there are many theoretical issues to consider to obtain real precision in 
some cases. At the LHC there will be relatively few cases where Standard Model discrepancies will 
not require some significant, and decidedly nontrivial input from PDF physics to determine their true 
significance. 
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